I've been using Midjourney extensively over the last month. Here-- I'll include some pre-requiste images here I managed to cobble together with words on MJ just to show you what's fun about using it:
I just had a few thoughts on AI art in general, specifically as it relates to Midjourney as I find it a more fascinating project with fewer limitations than Dalle.
First, if you aren't aware of what AI art is in general, it's this broad term that refers to imagery created by algorithms that work in a similar way to how search engines work. You enter a term, and it generates a response based on images that it has “studied” and learned correspond with certain words.
For example, this prompt:
“comicbook cover of cowboy, cigarette in mouth, illustrated by Frank Miller, detail portrait, glamor, sharp focus, black and white, frank miller, sin city”
Generates this image:
Or rather, it generates a loose interpretation that you then fine tune through variation. It understands the stereotypical "style" of frank miller, it understands images related to his work "Sin City" because it has seen them, and it can create an all together new image, never imagined by the man who crafted that style, with those ideas merged together.
I don't want to focus on the bigger differences between the engines (MJ V. Dalle), but the simpler version is that Dalle doesn't really like making art. It likes to represent literally what you give it.
Here's a good example.
This prompt: man in cloak staring at infinite repeating patterns, endless lights, magic
on Dalle generates this imaginative yet fairly basic image:
while Midjourney creates this artistic wonder
That's the primer on what I'm talking about. The main deal that interests me is that people created Midjourney. It is trained, brought up and unleashed by man. And now it needs interpretation.
The use of it requires study, the input, language, is not interpreted by a person. It is interpreted by an algorithm.
The end result is that it can only be understood by usage. It is not a mathematical input. Saying “an orange on a wooden table” does not create a specific image. It doesn't tell it style, size, shape, color, anything. So it decides things. It decides that a lot of time fruit on tables is often shot this way, so I will draw it this way, and it's typically framed this way, so I will draw it this way.
There's a long standing thread on the Midjourney discord discussing what is called “The three basket problem”. It is a study of the algorithm.
The problem is presented thusly:
There are three baskets. The first one is filled with blueberries, the second one is filled with apples, the last one is filled with strawberries.
Midjourney cannot create this image, not as described in this method. Dalle can (some of the time), because it is a less imaginative engine. It likes to make specific things.
Dalle can make things like this:
While MJ tends to produce grotesque (or, depending on your perspective, interesting) things like this:
The interesting thing is how this all works. There is no easy way to do this. The closest anyone has come looks like this:
“three discrete glass jars, each jar contains discrete color, contents blue balls are blue, contents green balls are green, contents red balls are red, three glass jars only --s 1250 --ar 3:2 –testp” and generates this:
Which is still not perfect by any means. It is not how a human would describe this picture. We'd say something like, "three jars of colored balls" or something, but if you're just shoving those words at a computer, it might try to make jars made OUT of colored balls, or just throw randomly colored balls into a jar based on an image of M&Ms it remembers seeing once.
It just-- it's interesting. It is like trying to commune with some otherworldly entity, but all the pieces of it are just code crafted by a man sitting at an uncomfortable desk in bad lighting, inevitably.
It is a microcosm of the systems we construct for ourselves all the time. People say the American justice system is a nightmare to navigate and wield to its full potential all the time, but we made it. We shoved it together with rules and laws and traditions. Yet there is an entire profession of people who hyper-focus on one specific sub-aspect of navigating law and justice, and we respect their ability to captain these waters enough to give them special names and papers certifying they understand that which normal people crafted.
Systems cannot be understood when you work within them, as is often noted, but even when you can observe the system on a notepad, look at its database, see how it thinks, functions, acts, it still requires interpretation. It still requires experiments, and bizarre jargon to act as it is intended to.
And we still cannot find a way to have three baskets with three distinct objects.