How can someone sit in front of a text area and explain what they want in enough detail that the AI is gonna produce something good?
Hi, I'm Justin Jackson. Here, I'm sharing brief thoughts on building a better life, bootstrapping, improving society, growing older in tech, being a dad...
My friend Adam Wathan has a new podcast called Adam's Morning Walk. And in the last episode, he talked about how he's thinking about building with AI.
Adam Wathan:When I think about our target customer, which is like non developers who want their things to look really good but aren't designers, I don't know. That workflow doesn't feel like the right workflow to me. Like, how can someone who's not a designer effectively sit in front of a text area and explain what they want in enough detail that the AI is gonna produce, something good? I've seen some cool videos of designers using AI to build stuff That feels sort of way out of reach to me in terms of what a developer could do because they're sitting there with, the LLM and prompting something into existence and then going back and forth with it, back and forth saying change this, try this, change this, try this. And it looks like a really cool workflow if you're a designer because they can iterate really rapidly, but they are still designing.
Adam Wathan:They know what changes they wanna make to make something look better. And they're just using AI as sort of like a means to try out those ideas faster. But if someone's not a designer, I just don't see how they can sit in front of the computer and have that experience because they don't know what they're asking for. The thing I've been thinking about more than anything is trying to sort of figure out what is the magical workflow that I think a developer would want to have, when they're trying to produce a design, especially, you know, as a nondesigner. And I I really don't think it's try to describe what you want in a text area.
Justin Jackson:Alright. So last night, I had gone to a meetup at a local bar, had a few drinks, was back home and was just listening to Adam talk about this stuff. And this is these are my favorite pod this is why I listen to podcasts is to hear how somebody else is thinking and processing stuff. And as I was listening to Adam kind of process this stuff out loud, it was igniting all sorts of thoughts in me. And so I sent Adam a bunch of voice memos that I think would be interesting for you to hear. Again, I'd had a few beers. I'm slurring my words a little bit, but these are the voice memos I sent to Adam last night.
Justin's voice memos:I met this guy at a podcast conference who was building an AI powered video editor. And as soon as he talked to me, was like, kind of like, okay, whatever. Know a lot of people doing this. He's like, no, no, no. The thing is everybody else is building an AI powered video editor, but they're using these existing models that have just been, you know, the corpus of data is like all of the text on the internet and they are then trying to predict what people might want when it comes to editing video.
Justin's voice memos:And his whole approach was, and he was still pretty early on, but I just thought his perspective was so interesting. His whole approach was, what if we trained an AI on how thousands of video editors edit video? So is there a way for us to record and build a large model based on the actions of thousands and thousands of video editors. What do they do every day? How do they edit clips?
Justin's voice memos:How do they do transitions? Everything like that. And then create a corpus of data around that that we use to power the model. And I just thought that was such an interesting perspective that I hadn't heard talked about. Everyone's using these off the shelf models for even things that might not relate to what they're trying to do, in this case video editing.
Justin's voice memos:So I thought that was interesting. And I lost the second thought. I'll think of it in a bit. I just had my second thought. So as you know, I've been exploring and experimenting with AI animation.
Justin's voice memos:And so much of like the people that are kind of at the forefront of AI animation and all these things, so much of the tools that people are using to do these things are still not good. And it feels like all the opportunity is in how the actual interface is structured. So for example, if you're going to prompt something like Sora two to make an animated scene, you can create you can feed it these like, they're not quite storyboards, but basically it's a single image with a character, a character and then the background scene. But you have these kind of three distinct objects all in one image. And if you feed Sora two, or really any of the models, that to start, your output is way better, way more consistent, and you basically have these characters you and background you can use to get consistent scenes where you can go, okay, we're going to do a close-up here and then we're going to zoom out and then we're going to have them You can direct them around the scene a lot better.
Justin's voice memos:But it's all so rudimentary. The UI and the user experience and the way things are organized is terrible. And I think what's weird to me about AI is that the UI is so unimaginative and unintuitive. We're using these text boxes to describe visual things where you could better organize the inputs, the UI, the interface for doing this stuff. Even in a rudimentary way, for example, making animation, if you had an input for upload all of your character shots separately as transparent PNGs, here's the characters, then upload your backgrounds and then that's a separate box.
Justin's voice memos:And then let's go through and describe each scene that you want. There's no UI for any of that. People are just hacking these existing chat boxes and the input's okay, but it's not like it's not even close to how good it could be. And the models can do an okay job, but they're reading these really rudimentary inputs when the user inputs could be better organized. And so for example, as a product person who's interested in animation, I can think of a user flow and an interface that would connect to the LLM that would produce consistently produce better animation.
Justin's voice memos:And the problem is the UI layer and then how the LLM is receiving it. But it feels there's ways to structure things on the input side that could dramatically improve the output. Maybe I don't know if this be the final thought, but it's also interesting to me and it feels like an opportunity that despite how good all the new models are like Gemini three and they're very good. Still feels like most people have multiple tasks where they are just continually re prompting, re prompting, re prompting to try to get to the output or the outcome that they want. So back to animation again.
Justin's voice memos:I was just trying to make these little visual interstitials, these animated hand drawn illustrations to illustrate these concepts I was talking about in a video. And I'm just like, I've got a pretty good starter image, but I'm prompting over and over again. Like I'm trying to get I'm trying to show that when a customer really wants a product, they're willing to bust through a brick wall to take out their money and buy the product. They're willing to overcome an enormous amount of friction. And so I'm trying to get Grok to animate this guy busting through a brick wall.
Justin's voice memos:And like sometimes it would get the brick wall right, but then it would mess up on the other side. Most of the time it just failed the brick wall test all the time. I could not get this character to just bust through this brick wall. And to me, it's because this UI is so blunt. It's just like such a blunt instrument where you prompt something, you wait for it to grind, and then you get the output and you're like, no, no, no, no.
Justin's voice memos:That's wrong. But now the UI, it has to go back and basically regenerate the whole thing. So much of re prompting is just regenerating the same thing. It feels like you're rolling the dice. Prompting in AI feels like rolling the dice still.
Justin's voice memos:And it feels like you're just now we're just spending most of our day just rolling the dice. Hours and hours and hours spent just rolling the dice, hoping that we get lucky that one of these prompts will magically produce the result we want. And again, maybe it's an input problem, maybe it's an organizing of inputs problem, maybe it's giving users more fine grained control along the way. Maybe it's like instead of just producing this one blob of output, maybe it's producing more distinct blobs that you can then say, Okay, well this part of the blob is good, but these parts here you messed up on. It feels like there's something there.
Justin's voice memos:It feels like there's something about that. The fact that we have we're now all we all know what that experience is of like especially with design work, but with other things where you're prompting, prompting, prompting, and it just cannot get it right. And you just end up going with whatever mediocre result you got from hours and hours of prompting. This is less so with code now. It feels like code has gotten better, but code seems like an easy problem for AIs to solve.
Justin's voice memos:But there's so much other things like design and animation and graphic design. There's so many other things that it feels like this particular friction point is still true.
Justin Jackson:So listening back to these voice memos, I I think what's interesting is there's so much effort and money being spent on improving these models, which I think is totally fair. But there hasn't been a lot of innovation around the UI side. And there there are little apps like I'll I'll share this one in the show notes called there's this one called fstop.vercel. App, where, basically, it gives you multiple UI inputs in order to create a better prompt for Midjourney or Flex or Nano Banana. But even then, it's still like you're you're still delivering everything in a single blob of a chat input.
Justin Jackson:Right? So you have a single prompt that's a blob that's then creating a blob output, and you can't break those inputs or those outputs into meaningful pieces. Right? I mean, you can do this a little bit with code. In code, this probably works the best where
Justin Jackson:it can say, okay. Like, I want you
Justin Jackson:to lock what you've done here and don't change that anymore.
Justin Jackson:And then I want you to go over and, like, fix this part.
Justin Jackson:But it's still there's not a lot of fine grained control on the input side or the output side of being able to say, okay, like, I want to give you multiple kind of distinct inputs. And if that prompt doesn't work with multiple distinct inputs, instead of reformatting the whole prompt and then you're just rolling the dice again, it would be nice to say, I'm gonna keep these four inputs, but these two inputs, I'm gonna change those two variables and see what happens. In my experience, there's and maybe this isn't possible, but in my experience, even if you redo a prompt and you just try to change one or two things, often, you never know what you're gonna get out. It might just, like, go a totally different direction. And you're like, no.
Justin Jackson:No. No. Like, what you had before was, like, pretty good. Like, I just needed to adjust a few things. So, yeah, there's it feels like there's something there.
Justin Jackson:I'd be curious if if any of these thoughts are interesting to you. If you have any thoughts, let me know.