Friday, November 03, 2023

Great Uncle Tubby's Beard!

 ... won't replace "Great Caesar's ghost" as an exclamation any time soon, but there it is.

I got Dall-E to generate an image of Uncle Tubby and I loved it. The only problem was, I asked over and over again for him to be clean-shaven.

Dall-E also turned it into a very nice logo for future product lines like sausages, tobacco and fruit wines.

To answer a question both Tim and I have had about just what in tarnation that dag-blasted thing is doing when it ignores simple requests, I got this answer from a young AI expert at work.

Under the hood of DALL-E 2/3 is basically a multi-modal extension of GPT. Think of a space where words + image parts reside together. So tokens ("words" and "image parts") that are related in your training data are neighbors. Based on your prompt, it gives you text tokens or  image tokens back. Then, the tokens are put together based on data it has seen before.

It's just grabbing image components it associates with words it sees in your prompt and gluing them together with a very modest amount of thought. The more complex the scene, the more likely it is that Dall-E will get everything confused. It certainly explains why my knights are sometimes wearing the damsel's dress and the damsel is waving a sword around like a loon.

3 comments:

tim eisele said...

I like the logo, it did an excellent job on that. The full-resolution Uncle Tubby is technically good, but it still has that weird hyper-real quality of AI art that I find a bit off-putting. The resolution is too good, making the image so sharp that it goes right through "realistic" and out the other side back to the "uncanny valley" again. This is aggravated by the fact that the AI image apparently has unlimited depth-of-field, so everything is equally in focus no matter how far away it is supposed to be.

From what your AI expert is saying, it sounds like getting results from the AI is a lot more like, say, breeding roses than anything else. You're expected to just keep throwing together combinations until you get a result you like.

I wonder if the best way to manage these things would be with a combination of the AI to generate a base image that is close to what you want, and then use something like Photoshop or Blender to manipulate it as desired. Then you could have the figure you like, and re-pose it against whatever background, rather than re-throwing the dice on another try with the AI.

K T Cat said...

Ai is like a drug. It gives you quick results with very little effort. I know Photoshop pretty well and thought of composing a scene of people taken from Dall-E to get the desired result, but it would take more time to do one such scene than it does to create 50 new images from Dall-E.

It's an ambition-killer.

Mostly Nothing said...

Looks kind of like Alan Hale Jr. with a beard.

Tim, your right, AI pictures are hyper real, over the top.