Creative Flux

Pierson Marks (@piersonmarks) and Bilal Tahir (@deepwhitman) dig into Google's newest image model, formerly known as "Nano Banana", the underrated technique of using LLMs to generate SVGs, AI industry gossip, and actionable creative workflow tools.

00:00 Advancements in Generative AI and Image Editing

04:02 Achieving Consistency in AI-Generated Media

08:39 Exploring New Tools

17:30 The Future of AI in Media and Communication

27:07 AI Industry Gossip
34:00 Satori MCP

Links:

Nano Banana (Google Gemini Flash 2.5 Image Preview): https://blog.google/intl/en-mena/product-updates/explore-get-answers/nano-banana-image-editing-in-gemini-just-got-a-major-upgrade/
Flipbook Generator: https://www.hackyexperiments.com/micro/flip-book
Satori MCP Server Repo: https://github.com/Jellypod-Inc/satori-mcp-server

What is Creative Flux?

Each week, industry experts Pierson Marks (@piersonmarks) & Bilal Tahir (@deepwhitman) bring you practical insights, creative workflows, and the latest breakthroughs in generative media. We cover everything that's happening in AI-powered audio, video, and image creation, sharing hands-on tips and industry news straight from the front lines.

Topics include new generative models, creative best practices, open-source tools, real-world use cases, and the evolving landscape of AI-driven content creation.

Pierson Marks (00:02.528)
Well, we're on episode 11. It's great.

Bilal Tahir (00:06.25)
Hello, hello, yeah, wow, made it into the teens.

Pierson Marks (00:09.824)
Made it into the teens. Oh, and this is not teens yet. 11. Almost, almost there. No, no, no, no. 11's not teen. 13, 14, 12. So.

Bilal Tahir (00:15.438)
Oh, and isn't 11 18? I don't know. Is it 13?

Bilal Tahir (00:22.862)
Oh yeah, that's true. I don't know, in my head, I 10 to 19, or 11 through 19, I think, of those teens years. it's interesting because you have, what do you have, like 20s, 30s, you know, decade. like the 10 sounds weird. And then I don't know what you call the first 10 years. Like the O's or the N's. Yeah.

Pierson Marks (00:31.113)
Right.

Pierson Marks (00:43.134)
Yeah, you being a baby. Yeah, totally. It's interesting. It reminded me like in other countries, sometimes the ground floor is floor zero, like level zero in the United States, it's level one. And...

Bilal Tahir (00:56.685)
Mm-hmm.

Pierson Marks (00:59.412)
Like is it based? Yeah, it's it's just interesting how like some people like, it makes sense. Start at zero. This is the zeroth floor index at zero. And then, and then the second floor is the first floor and then third, but we index at one base one.

Bilal Tahir (01:06.338)
Right, yeah.

Bilal Tahir (01:13.676)
Yeah, yeah, yeah. It always confuses me too. I was in Europe and I kept getting off on the first floor. And then I was like,

Pierson Marks (01:18.752)
You're like, right, right, Yeah. Off by one errors. Yeah, there I was at the AI tickers the other day that we were just talking about in San Francisco and somebody we were talking and it's like, yeah, there's two hard things in computer science, naming, cache and validation and off by one errors.

Bilal Tahir (01:24.876)
Off-the-wall errors. Happens everywhere.

Bilal Tahir (01:42.454)
Yep, yeah.

Pierson Marks (01:43.008)
Those are the two hardest things. And I was just like, yeah, that's funny. I forgot the off by one errors and that makes sense. but yeah, another, another interesting week. So if you're hopping on, for the first time I'm Pearson and this is ball and, yeah, we just, I mean, just chat about.

Bilal Tahir (01:50.348)
Yeah. Classic.

Bilal Tahir (02:00.771)
loop.

Pierson Marks (02:05.32)
generative AI media stuff. This week was pretty big because maybe it was the chat GPT moment for images. I don't know. Maybe not that widely talked about.

Bilal Tahir (02:12.674)
Hmm.

I mean, I think it was a huge improvement. mean, and the rings here. So we're talking about Nano Banana. It was a model that

came on LM Arena called Nano Banana, made some waves because it really just made image editing way better than it was before. And then it was revealed like a couple of days ago that it was actually Google's model. We kind of knew it, it was Google's model, you they had been hinting. But the official name now is Gemini Flash 2.5, which a lot of people are like, dude, had Nano Banana sounds so cool, now it's Gemini Flash 2.5, is, ugh, what are you doing? But, you know, it's available now in Gemini Studio and all.

Pierson Marks (02:47.935)
Right.

Bilal Tahir (02:53.552)
and replicate and I've been playing around with it too and it's pretty good. I would say image editing has been steadily getting better. We talked about Quen Image last week which you know was really good as well and Nano Banana just takes it to the next level. It had like almost a 200 point L-O-R-A-T-ing jump in LMA so that's pretty significant and you can basically I wouldn't call it like replacings Photoshop but like you can do a lot of stuff like you can remove the background you can change the glasses on my face and

and my face pretty much remains the same. think one of the big challenges with AI models is like it would replace the glasses, but then it would kind of change the face a bit or add some other artificial effects. And those things are now much better. And so that's why these fine-grained edits are becoming more possible, which opens up a whole dimension of use cases that you had to use Photoshop for before.

Pierson Marks (03:38.9)
All

Pierson Marks (03:50.912)
Totally. No, it's super interesting. mean, when we were talking briefly about NanoBanana last week, there was one phrase that you said. You said NanoBanana like four times in a row. And it was so funny. It was like NanoBanana, NanoBanana. It's like it's a tongue twister. It was just so funny. Because I just was editing a little bit of the last one. But yeah, it's really cool. I mean, image consistency, being able to pinpoint certain things, and

Bilal Tahir (04:07.086)
I'm really bad at explaining.

Pierson Marks (04:19.42)
actually edit them is super powerful because like we mentioned before, mean, are lot of creatives. If you're a creative, you don't necessarily want to figure out how to use the tool. You just want to be able for the tool to like work with how your brain works. So when you're like...

Bilal Tahir (04:36.782)
Mm-hmm.

Pierson Marks (04:38.292)
If you're a painter, you know, don't want to have a canvas and have to think about, my finger needs to hold, I brush at a certain angle and then like stroke on that canvas to paint something nice. Like you're like, no, I have an image of this panda bear that I have in the woods and I just want, I just am drawing on this canvas. And you're not thinking about the motion of your hand. just kind of comes naturally. And I think we're entering more so towards that era of AI, like generative media tools.

Bilal Tahir (05:01.614)
you

Pierson Marks (05:08.226)
where it's less so about the tool, but it's much more so hooking in like straight to your brain almost and being able to just do the thing versus figuring out how to do the thing. And it's cool.

Bilal Tahir (05:20.844)
Yeah, yeah, for sure. mean, I...

really I'm one of those people that struggles a lot with like communicating exactly what's in my head. And it's, you know, it can be so frustrating because you have this amazing, powerful model there, but you're just like, you know, there's, can't really communicate properly. And that's why I think there's such an alpha still for people who just are able to express themselves, you know, you know, accurately about exactly what's in their head until we get like some brain and, you know, machine, neural link machines or something. Maybe this hookup directory to the LLM, it's still going to be a problem.

Pierson Marks (05:42.208)
All

Pierson Marks (05:45.77)
you very

neural link.

Bilal Tahir (05:53.088)
But, but Nano Banana does a lot of interesting things. I, I think some of the really interesting use cases I can see is like, you can have like the same person in different shots, which I think I've seen examples of. So that really opens up a way to create consistent storylines, et cetera, because you can generate a video based on a starting frame and then take the same character, put them in a different setting and then start a video from there. And so I think we're going to see.

lot more consistent stories like that because of this. The other one is just generating obviously images of yourself in different settings like LinkedIn photo, bridal shower, right? Maybe you're in Paris or whatever. And so obviously the classic image editing cases, it just became even easier.

You're probably going to see, I think, a lot of the social media use these kind of things as almost like a filter and just be able to put yourselves in whatever position, in whatever style, et cetera.

Pierson Marks (06:55.306)
Right. Right. I saw this other day. so cling now. I forget. Is this a new thing? But cling, you can add in end frames.

Bilal Tahir (07:05.774)
Yeah, start to end frame. Well, they launched a recent one where you can do start and end. it's actually pretty cool. You can create some interesting videos from it. I saw one where some guy did this series where he started the frame with him shooting a basketball. And then the end frame was like the basket was like a kilometer or like a mile away or whatever. And so the video came out. He takes a shot and goes in the clouds and then.

Pierson Marks (07:15.36)
And think about it.

Bilal Tahir (07:32.206)
goes in the basket. And I was like, oh, that's actually pretty cool.

Pierson Marks (07:35.002)
that is cool. That's super cool. That reminds me of the Dude Perfect guys. Their whole channel is built on trick shots but they're all just great.

Bilal Tahir (07:43.702)
yeah, that takes like ages to do. That's like, I don't know how they have the patience for that, but I love their reactions when they finally get it.

Pierson Marks (07:48.158)
Yeah, I mean, this is one of biggest. Totally, No, it's super interesting. But yeah, the cling, like the start and end frames, that's super powerful because, you pair that with.

Nano Banana, now Google Gemini Flash 2.5, Image blah blah blah. Horrible naming, like come on everybody, let's go. It's so, it's, you know, we do this every week. We talk about this and I'm just like lost. Okay, this is Google Flash 2.5 with Image. Is it different than the Google Flash 2? Wait, wait, correct me here, right? Like there's two Google Flash 2.5 models. Yeah, Gemini, sorry.

Bilal Tahir (08:03.682)
Right.

Bilal Tahir (08:16.909)
Right.

Yeah, yeah, yeah.

Right, right. So Gemini Flash 2.5 is an LLM model, but they do have an image. So it's not really a model, it's a system. So yeah, technically it's called Gemini 2.5 Flash Image Preview, but if you go to the API, it's the same model. I think it just, like chat, GPT, it just creates the image under the hood, but it uses the same model. And actually, of the, I guess, complaints about this is that because it uses Flash, it's not the smartest model.

Pierson Marks (08:42.516)
Right.

Bilal Tahir (08:51.664)
So the prompts, people have like not, coming back to our communication problem, like they felt like they have not been able to just give it a more of a vague prompt and being able to run with it, like because it's a small model. And so I wonder if Google will release a pro version, like basically the same image technology, but paired with pro 2.5 or even three. And that would probably make it a lot more powerful. So interesting.

Pierson Marks (09:16.202)
Totally. you might know a little bit more than me. Neither of us are AI researchers, so we've never been in the depths of training a model like this. But my understanding is that when you get these flash models, pretty much will have a big model, and then you distill it down into the flash. And I wonder if that's just what's happening too. So we could expect that there's going to be 2.5.

Because what's Gemini? It's Flash. Right. Right. Because everyone's kind of like, that's also interesting too. mean, like the ultra big models, like you had Meta's, like Llama Behemoth or whatever, that got booted, Llama 4 I think. And it's just like they're two, like they came out with three, like small, medium, large.

Bilal Tahir (09:47.714)
Right, it's pro and then flash and it used to be and there was an ultra one too, but that's, I don't know if that's still a thing.

Bilal Tahir (10:02.286)
Mm-hmm.

Bilal Tahir (10:09.858)
Right.

Pierson Marks (10:10.1)
But like the top end large one, they just were like too expensive. It's not going to be comparable to.

Bilal Tahir (10:15.054)
Yeah, well, was training and then of course, Zaka had to pivot because llama four sucked and they'll probably never release it. I think it was called Maverick. They're probably never gonna release that, but meta is interesting. I don't know if you saw the news. Well, before I get into the gossip of it, but just to kind of close the loop on the editing thing, I think what's super powerful about this, like the start and end frame things you mentioned is like, you can like imagine like the fine grained control you can do like, you can like have something like your character starts here and then generate.

Pierson Marks (10:19.989)
Right.

Bilal Tahir (10:45.168)
an edited image where it looks to, your character looks to the side and then you can generate a video, you can say he slowly gazes from the start to the right and it'll actually follow it exactly. And the cool thing about this is that rather than generating a video and hoping the video is right, once you have the start and end frames, you have a lot more control and you can be very confident that the five second video you'll generate which takes longer and is costlier.

it would probably be pretty good. So I think it does make it much easier or at least cheaper to come up with a more tightly controlled movie. So yeah, very powerful. imagine you're gonna have a lot of people who can do start end frame, then use that end frame for the start of the next frame. And then for multi-shots maybe have that there is a completely different image and stitched together. So there's probably some great amazing pipelines you can create.

Pierson Marks (11:26.016)
So I think.

Bilal Tahir (11:44.969)
using this.

Pierson Marks (11:45.066)
No, I bet. I haven't used Google Flow with VO3. I wonder if... So Google Flow is a product by Google that is like a creative studio that kind of felt like, at least to me...

Bilal Tahir (11:52.79)
Yeah. Yeah.

Pierson Marks (11:59.508)
This is the mental model, like the sandwich approach that I think that they're going towards where, you create the bookends, the first frame and the last frame, and then VO3 fills in the middle. And I think this is kind of a piecemeal approach right now that's happening where you have nano banana, the image, flash image model, and then you have like VO3, and then you can do cling, you can do like all these, like it's very still fragmented of a profit

Bilal Tahir (12:03.0)
Mm-hmm.

Bilal Tahir (12:10.723)
Right.

Pierson Marks (12:29.442)
which is super exciting. I generally think like it's so cool to see such a broken like industry where it's like there's so much cool technology but so fragmented so broken it's a lot of alpha and literally just like listening to Creative Flux every week and figuring out I mean what are people caring about but also just to be able to actually like

Bilal Tahir (12:41.006)
Right.

Pierson Marks (12:53.768)
to actually go and use these things. People will talk about stuff, you'll see it on Twitter, like, that was cool. You'll never actually use it. Literally, you'll get put into the top 0.0001 % of society if the next time you're on Twitter or whatever and you hear somebody say, hey, check out Nano Banana, and rather than be like, that's cool, it's actually going and playing around with it for like 10 minutes, you're...

Bilal Tahir (13:15.501)
Hmm.

Pierson Marks (13:16.245)
top like percentile like so small you're seeing the future right in front you're playing with the future and it's so exciting, you know

Bilal Tahir (13:19.481)
absolutely.

Bilal Tahir (13:24.086)
Yeah, 100%. That's why I love these tools and I love playing around with them because every time there's an update, there's stuff you can do that you weren't able to do previously. So it's a completely new dimension opens up to you. And it's like...

Yeah, I see so much cool stuff like on Twitter or other platforms, but you're right. Like a lot of people, most people don't know about this or they, if they do, they do the very lazy, like, know, generate one image or whatever, like, you know, they won't write, but the real alpha is like just taking it, stitching it together with other models or whatever, or just taking that extra step and, you know, trying to have together something that's very unique and interesting. Yeah.

Pierson Marks (14:07.743)
Mm-hmm.

Pierson Marks (14:11.466)
Totally.

It's yeah, I think I mentioned this before. mean, one of my favorite things that I genuinely enjoy is on a Friday night having a glass of whiskey sitting down in front of my computer and coding just like doing whatever I like. I love programming so much and I love building things and playing around with this stuff. And it's a it's very fun. It's relaxable, especially if you get into like a flow state where, know, you're working on something new. It's not like we're fixing a bug. Sometimes that can be fun, but like you're really just

Going from a blank slate and just building something and like that's weekend I'll mention that later like what I built this past weekend, but it's generally fun and you're in a spot right now where It's like a play like a toy box of just all these things and I even sometimes struggle with Knowing even where to start there's so much out there and you just kind of like hey this weekend I'm just gonna or this tonight. I'm just gonna play around with Nano banana and just try to get some consistent

comic book frames or what you did with the space needle in the background and I was like that looked really cool. mean Yeah, it's fun

Bilal Tahir (15:13.07)
Right.

Bilal Tahir (15:19.522)
Yeah, mean, it's like those, it's like same with me. I mean, I love building these tools and stuff. know, it's like low stakes and it's fun. And it's funny, like you take a break from coding by coding, but that's kinda, it's true. Like, you know, it's not really very strong. Yeah.

Pierson Marks (15:31.196)
People don't understand that. My dad growing up, he's been in the solar industry for 30 years and long time to be in renewable energy 30 years ago. he always just said to me, he's like, hey, if you find something that you're super passionate about and you like doing it, work won't feel like work. And I feel really lucky to feel like every day, I mean, we get to play around with

Bilal Tahir (15:50.776)
Yeah, same.

Pierson Marks (15:56.096)
like these amazing things. mean, everything from text models, video models, audio models, image models. And that's honestly like when I left Amazon to start JellyPod was just because it was like, look, the most amazing technologies right here. I got to be able to play around with it as much as I want to and build something cool. so, bye.

Bilal Tahir (16:19.416)
Yeah, yeah, you're like Jeff Bezos, you saw the statistic, it's growing by a thousand percent. And you're like, I gotta jump in. He had that with the internet, was like, it was growing by a thousand percent. And he was like, well, I have to do something here.

Pierson Marks (16:25.984)
Right. What's the worst thing that could happen to? But yeah.

Bilal Tahir (16:36.642)
But it's cool. I built a fun little tool for me was I wanted to basically be easily able to take one base image, create different poses, and then create a nice flip book GIF out of it. first, it was basically a script that had a hard, basically an array of prompts, which took the image and changed pose to making a victory sign or smiling mischievously or whatever. And I ran that in parallel, which was nice. mean, because you can run up to 10, I think, requests.

Pierson Marks (16:48.352)
All right.

Bilal Tahir (17:06.626)
and it's pretty quick. So it takes less than five seconds for the editors too, which is also pretty cool. Compared to chat GPT and stuff where it just takes ages, so it's great to have something quick and fast. So it generates the 10 or so images, very similar but different poses. then this actually took me longer because it out rendering a GIF on the browsers.

Pierson Marks (17:15.061)
Right.

Bilal Tahir (17:28.952)
hard. I found this weird 15 year old GitHub package called gift.js and it was fine because I wanted to use it on the browser. don't want to use FFmpeg or Broward because I want to make it free.

Pierson Marks (17:41.376)
Mm-hmm.

Bilal Tahir (17:43.214)
So I was able to figure that out. if you guys want, I mean, we'll link it, but you can like generate your images in Gemini Studio and then you can upload multiple images and then it'll like play a video and you can set the speed of the video and then you can create a GIF out of it or GIF. I don't know which one. I always call it GIF, but whatever. But yeah, it was pretty fun and I can see some really cool memes coming out of that.

Pierson Marks (18:08.03)
Right. So.

Bilal Tahir (18:08.782)
The other one I did, which was funny, is you can create a storyline. So you can start with an image and you can say, OK, put a mustache on, then put a beard on, then put a parrot on his shoulder. And then when you do that, it's like a progressively edited image, which is very similar to if you've ever seen videos from the School of Life or whatever. He has this cool animation style where he takes the same image and adds layers to it. And it's a very interesting visual. So yeah, so lots of cool use cases like that.

Pierson Marks (18:33.908)
Right. Wait, was that a GIF also, the second one? So you'd have like a.

Bilal Tahir (18:40.182)
Yeah, yeah, because you can easily do it. mean, you can do a video too. With this is easy because most platforms, you don't have to play it and it just repeats and the repeated effect kind of looks cool. know, the loop effect looks kind of cool.

Pierson Marks (18:54.528)
That's super interesting. reminded me of like, maybe this is what the School of Life videos are. I don't know what those are off the top of my head, but where you have a first frame and the second frame, it kind of fades into the next one where it kind of, it's the second one. It kind of looks like it's a cloud and it kind of goes over and the next one it kind of fades in and it kind of like, yeah, I don't know. It's interesting. Maybe also a re-motion would be,

Bilal Tahir (19:07.694)
Yeah.

Bilal Tahir (19:14.528)
I think I know what you're saying. That's a good question.

Bilal Tahir (19:22.22)
Yeah.

Pierson Marks (19:22.248)
I wonder if you have something in the browser that you could do that.

Bilal Tahir (19:24.206)
Yeah, well you posted something that I saw too, economics, it's funny, economics explained, it's actually, I've been following that channel for years and it's a really cool channel. you know, their whole format is they pick a country like the US or Spain or whatever, and then they rank it based on GDP, they first do a 20 minute analysis on the country where it's at and from the lens of economics. And then they have a leaderboard where they put the economy on, which I always find very fascinating. so the way they obviously this way, it's your basic

Pierson Marks (19:32.352)
All right, you too.

Bilal Tahir (19:54.233)
faceless

you know video channel and so they mostly show b-roll footage of people like, know crossing the street or whatever and then they overlay that with graphs and charts and stuff because when they're explaining economics data and they apparently are building a library using your motion to automatically do it and it's so funny because we at Jellypot have Thought about this for a while now because a lot of our users what they do is they create podcasts, you know based on Educational content, you know, they're explaining a certain topic and usually the topic has some data and stuff and right now we just

to the audio but as we move into the video format one of the things we want to explore is also to supplement that audio with data charts and stuff because you know i think it's super powerful you know that's literally why people like going to like college or whatever to the lessons because the professor when he talks like he actually puts up a slide you know and now

Pierson Marks (20:44.448)
All right.

Bilal Tahir (20:45.194)
So now you can have that automatically, I think would be super powerful. Like imagine I'm talking about, the economy of the US versus Mexico, whatever. And the AI takes that transcript, figures out, okay, I should probably pull from the web the data related to this. And then, it probably should be a line chart or bar chart. And then it has a react component and just like populates automatically. And I think that would be very powerful.

Pierson Marks (21:09.002)
This is the 100%. I know this is, I am so genuinely excited for this specific thing because it is feasible and this will segue into like a tool that I built in a second. But it's just so many people are focused on the whole like generative media in the terms of like a generated image, generate the video and that's super cool. And.

that's going to continue to improve. There are a lot of things that you can generate today without generative image and video. Like when you go to Excel and you plot your your sheet into a graph, it's not using a generative image model to make that plot. It's literally taking the data, it's putting it on a line and putting on a graph and it's drawing a line through like that is something that computers can do and they do well today. And we've done that for decades. Right. And when you have an LLM

Bilal Tahir (21:39.533)
Right.

Bilal Tahir (21:49.581)
Right.

Bilal Tahir (21:57.442)
Hmm.

Pierson Marks (22:01.362)
you can plug in the LLM as the brain that can take that information and format it in the right way so that a traditional rendering engine that we already have today can render a video, whether that's an SVG, a scalable vector graphic, which is just a little bit like dots and lines and points and whatever, which have animations as well.

Bilal Tahir (22:15.383)
Bye.

Pierson Marks (22:26.016)
And so much can be done with SVGs in general. Like it's a very, very powerful format and LLMs can write code very well. So they can write SVGs.

Bilal Tahir (22:26.382)
Thank you.

Bilal Tahir (22:30.679)
yeah, yeah. Yeah. Well, I.

I was talking to someone from OpenAI and they actually gave, I can't reveal too much, but they were very private obviously about the information they could share as well. But one of the things they told me was that one of the emergent behaviors of GPT-5 was that it got really good at generating SVGs. I haven't personally tested this out, but apparently they were surprised because they didn't really directly train it for that, but it got really good at generating SVGs. I'm like, So you're right. SVGs are super powerful because they're super scalable.

Pierson Marks (22:55.264)
Mmm.

Bilal Tahir (23:06.704)
animated because it's just code. And so you can interpolate, can make move. If you draw a stick figure, the logic is inbuilt that, oh, I want to move an arm. You can do that. So I imagine the companies like Figma's of the world are going to be able to make this much easier to do as well. But for now, can do it yourself as well.

Pierson Marks (23:19.424)
It's over.

Pierson Marks (23:27.712)
Good luck.

Pierson Marks (23:31.154)
Yeah, it's super interesting. I could take that we could take this multiple ways. One of the things I want to explain to the audience right now, the difference between SVGs and just like traditional images, because I think it's an important thing if you're interested in like images and just media in general. So when you see an image on your on your computer or on your phone, there's a bunch of everybody knows a bunch of pixels on your screen and those pixels, each pixel has a few different colors in it.

red, green, and blue. Red, green, and blue. those pixels, you shape, based on the percentages of red and green and blue, it makes it a certain color. So you have pixels. An image on the screen, essentially, when it's 100 by 100, like width and height, that's 100 pixels by 100 pixels. And you just change the pixel for each little dot, and you have what's 100 times 100, 10,000 pixels on the screen.

And that's kind of like how images have always worked on online until we had these vectors and vectors are essentially rather than saying, Hey, I want pixel one to be this color. want pixel two to be this color on pixel three to be this color all the way for all those thousands. just have, Hey, I want a line from point A to point B and just draw that line. so the representation and data is very, it's just like, Hey, I have a point A point B and draw a line between them. It's like this three data, three pieces of data there.

And that's a vector. And what's super cool about this is independent of the screen resolution. So you could have a 4K monitor or you could have a 180p monitor, a 720p or like an ultra like an IMAX monitor. It doesn't matter the resolution because it's literally have a point here, a point here and draw the line. And so you don't have to worry about pixels anymore. And so you could scale this to whatever screen size without losing any sort of resolution. And that's super

Bilal Tahir (25:19.598)
Right.

Yeah.

Bilal Tahir (25:28.675)
Hmm.

Pierson Marks (25:29.772)
powerful because you don't have to worry about like this not looking good on a certain screen. It just will always look good. So I wonder if we'll just see more and more things move away from images and everything just being vector graphics because it's less expensive to store and transmit. And if we can it's hard because like this image that we have right here it's like a bunch of pixels and it's going to be very hard to like always do the right thing all the time.

Bilal Tahir (25:48.472)
Yeah. Right.

Pierson Marks (25:59.454)
But for static images, mean, just vectorize it. And you could probably do it pretty well.

Bilal Tahir (26:02.742)
absolutely. It's so fascinating. it's controllable too, actually. I think we mentioned it before, but one of the coolest things I saw was Duolingo. What they did was, and it's something I've been thinking about too, because people have been trying to get to this end state where you talk to an AI and a person replies wherever you can see their mouth moving, they're smiling.

Pierson Marks (26:15.04)
Mm.

Bilal Tahir (26:27.79)
when a lot of people, when they think that they think like they've okay, we literally will be generating in real time an avatar who's talking and that, you you can do that, but the cost of that is really insane. So what Duolingo did was, which was very cool was they basically were like, okay, I don't want like, you know, the person who, you know, a proper person with all the expressions, you know, let's just make a scalable vector graphic like avatar. So they have this avatar, I forgot her name, but she's an avatar. So you have a base image, you know, and with SVG,

what they have is like layers and stuff so you the mouth is a layer the nose is a layer the mouth like these are all like mini SVG like lines and stuff like what Pearson was saying you know they've drawn in and the cool thing is you can like change it like you if you want me to smile you just like point the line up you know and you can animate and you can say okay move it from here here that's a smile right so you can create those animations so what they did was they started with this base avatar they came up with they talked to like human interface designers and stuff they came out with 50 expressions the most popular expressions

people make when they talk, right? Smile, frown, you know, sad. And with those 50 pre-programmed images, and so because they're pre-programmed, there's no AI rendering, right? It's just an if-then statement. What they did was, Annie would talk to you, they would use a text-to-speech to create the audio, then using the transcript, a model will classify it. Does she smile? Is she sad? Is she frowned? And based on that, and actually, if you look at the demo, and you can use it in their app, it basically does the job. It's like 99.

99 % there in terms of talking to a character. helps that she's animated too, because it's not a real human. there's, know, even if her mouth is just moving in the same way, you're not, you know, that, you know, it's not off-putting anyway, because, you know, you're used to cartoons kind of talking in the same, with the same mouth movement. So very cool. And I'm surprised we haven't seen that character AIs of the world or other, you know, companionship apps, like really embracing this. Cause you know, think about like having, having, you, we had the text to speech.

And if we can just put these avatars out there, which is just code. I mean, it's just static if-then statement. So it's not like the cost goes up at that point. You can have a much more lifelike experience with your AI.

Pierson Marks (28:40.564)
No, it's super exciting. don't know. You know, this is like those conversations that we have and I don't know if it's like these, there's a term when you're building a startup, it's like having like, not disagreeable, some types of opinions, unconventional opinions, I forget the exact phrase now off top of my head, but it's like, everyone's so focused on the other stuff, but hey.

LLMs can actually do this today. Like they can write the code today to do it. And I mean, that's what we're building and we're working on it. But I wonder if people are sleeping on it. It's like the hidden secret that like there are people behind the scenes like working on this, doing it like the economics explain guys and like Duolingo. it's really cool. mean, so.

Bilal Tahir (29:26.338)
Yeah, so, you know, for anyone listening, I mean, if you, you know, that inspired you, please go ahead and do it. I think there's so many cool applications you can build off using classification and then pre-programmed things. One of the fun, I did this like fun little project, I think six months ago now, and this is like, it's probably way, it would be way cleaner now, but I basically was, it was called like standup comedy, like practice. like, I would like speak into a mic and I would transcribe it real time, send it to an LLM, and then I pre-programmed sound effects.

so applause, boo, or like, And so the LLM would decide, was it a bad joke or, you know, like a controversial joke, because you even had the ooh sound, and it would like then play that sound. And it was actually pretty cool. Like it gave me great feedback, you know, I was making a joke and it was like, you know, plotting or not. you know.

Pierson Marks (29:58.602)
haha

Pierson Marks (30:04.019)
Right, right.

Pierson Marks (30:16.704)
That's hilarious. You're the comedian, practicer, AI. You know, these are the types of little like vibe coded weekend projects you throw in the app store and you even for free. mean, you probably get it would kind of blow up. mean, it's it's cool. I mean, it's funny.

Bilal Tahir (30:21.357)
Yeah.

Bilal Tahir (30:34.701)
Yeah.

Yeah, I might look into that. I think it's very interesting. I think for anyone interested in checking what Duolink did, check out, I think the tech they used was called Rive. Very interesting technology. I haven't heard about it, their site looks super nice. I checked it out. And I feel like you need to be kind of a designer, because there's a lot of designer patterns that go into it. But they have this whole concept of photo

shop like parallel thing they call it bones and stuff and you have to have made the skeleton so a lot of work there but very cool stuff

Pierson Marks (31:12.596)
Yeah, Rive is, I remember when I first ran across free motion, I also came across them and it was like, it's like, interesting. What's going to work more for our use case? I think we did definitely made the right choice because, Rive seems like very powerful for, especially if you're creating like videos and just interactive content that's less static. feel like it's dynamic.

Bilal Tahir (31:18.125)
really? Yeah, thanks.

Bilal Tahir (31:28.258)
Mm-hmm.

Bilal Tahir (31:40.246)
Right. And it's the only one, as far as I know, that actually has an S to code. So you most have like a UI graphic for designers, right? You can design, here what you do is you design it, but then they give you the code that generated. So then you can programmatically update it. And that's why Duolingo used Ripe, because they had their designers come up with the avatar, the expressions and stuff, and then they gave it to the developers. And you're like, all right, now put this in the app. So very powerful. yeah.

Pierson Marks (32:04.672)
Right.

Well, we'll tag that in the show notes. And maybe this week, I think we're going to try to one of my weekend projects is to figure out maybe send a newsletter out to everybody here as well. Kind of recapping what we talked about and maybe summarizing the thoughts at the end into like a more written format. So we have the blog or the podcast. And then if you want to subscribe, just we'll send out the podcast to your email inbox. You could read the summary, see some of the show notes and then also listen to your favorite podcast player.

Because I think it's powerful. I think we talked about some really cool things here. I mean we talked about Google Nano Banana Flash We talked about cling real quick and flow we talked about rive We've talked about what you built over there. And yeah, how do you can people use the thing that you've built?

Bilal Tahir (32:47.086)
Mm-hmm.

Bilal Tahir (32:53.614)
the stand-up one or the easy GIF. They're all out there. mean, it's a stand-up one is actually an open source repo. I haven't touched it in months, but maybe I'll brush it up. But I that could be interesting.

Pierson Marks (32:55.956)
The easy gift.

Pierson Marks (33:03.259)
Yeah, that's fun. Yeah, it's cool. I did an open source project this weekend too, which kind of segues into kind of what we were talking about with SVDs. For anybody out there, I won't dig into MCP that much just because we can have a whole other conversation on this, which probably does make sense. It's not as generative media, but MCP is called model context protocol. And just like on the internet, we use HTTP.

Bilal Tahir (33:13.548)
Yeah, yeah.

Pierson Marks (33:31.784)
is a way for your browser to communicate with servers over that protocol like HTTPS, Google.com, you know, put the link in there. It's just another way for agents or like your LLM to communicate with the server. And it's similar. It's built on top of HTTP, but it's just a way to kind of more the protocols built for LLMs because you need to describe the endpoints in natural language. They're more of workflows versus resources. And I won't dig into it.

But MCP is just a way for your Claude or ChatGBT or Gemini to hook into services outside on the internet. Very powerful.

So this past weekend, what I did was I built an MCP server called Satori MCP. And what it does, it allows you to generate beautiful images with React components. I didn't build the underlying library, give credit to Vercell there. Vercell built this library called Satori. And this plays into what we were talking about SVGs really nicely because what Satori, Vercell's library does, it allows you to create SVG graphics

from React components. So React is a framework. A lot of websites use it. built by Facebook. And it just is a way to kind of have dynamic, nice-looking, reactive components on the web. And so Tori, if you use React components, it's almost like HTML. You put some classes, styles, colors. You put the X in the top left, image in bottom right.

Bilal Tahir (34:47.275)
and

Bilal Tahir (35:09.294)
Right.

Pierson Marks (35:12.992)
So Tori takes that and will convert it into an SVG graphic. And it's really cool because I can, I love like, I like coding. I think code is very intuitive. You're like, Hey, I have this box. I want the image. I want the div centered in the middle of the screen. You know, the famous center this div. But you could put like the title in the middle, subtitle below it. And

That is awesome on a web page. But how do you go from like, I created this code that looks great, but I just want a screenshot of it. I just want an image of that. And there was not a great way to do that until this library.

Bilal Tahir (35:43.683)
Yeah.

Bilal Tahir (35:49.804)
Right. Yeah. Yeah.

It's super powerful and the dynamic nature of it. Originally, obviously, the use case was open graph images. what's an open graph image is basically, whenever you paste a URL in, if you see the image, the way that renders is in your code, you have a meta tag with an open graph header that says, if this page loads, show this image. And so before this, you literally for every page, the naive way to do it

Pierson Marks (36:05.6)
preview.

Bilal Tahir (36:23.066)
is like every page, let's say you have a blog, for every blog post, would like make a screenshot, image or make a create an image in Photoshop or whatever, download it into your code base and say, and point your code there. With the cool thing about Satori is like, you can just say, okay, just this is the, you know, take the title of the blog post, take the description, and this is my template, like my code gradient, and it might go in the top bottom left corner or whatever, right? And you just like,

Pierson Marks (36:51.466)
Mm-hmm.

Bilal Tahir (36:53.022)
do that once and it just will dynamically generate the same cool looking graphic with the title and description and you don't have to ever worry about it. And same with, I mean, you can extend that now to more than open graphs. Maybe you wanna create YouTube thumbnails or something. We do that for GelPod. We create a blog post, cover images and stuff based on the title. You can just generate it using a button and not have to worry about creating them separately.

Pierson Marks (36:58.378)
Right.

Pierson Marks (37:03.466)
toilet.

Bilal Tahir (37:22.928)
So very powerful.

Pierson Marks (37:23.742)
Right, right, right. It's super cool. for every, like the frustration, so frustration I have, you could go, like if you're a graphic designer or just like anybody that's familiar with Photoshop, you could have a template in Photoshop. Like you could have, let's say the title of your blog post in the middle and the description underneath. And every time you want to create a new blog post or create a new image or whatever, you take this template that looks nice. I wonder if my phone's doing the same thing they did last time. Does it look like it's all?

Glitchy right now.

Bilal Tahir (37:55.828)
It's a little like lagging but still see you.

Pierson Marks (37:58.43)
lagging. weird. Hopefully it's chill. I'll cut this part out. But it's lagging. wonder. I'll cut this one second.

Pierson Marks (38:14.314)
This would suck if it just cuts out right here.

Bilal Tahir (38:16.77)
Yeah, you're about to deliver gold.

Pierson Marks (38:20.264)
Right. Do I sound like I'm laggy or is it just the image? OK, OK, cool. So, pretty much the if you're if you know Photoshop, you can go to Photoshop, you can create a template and every time you want to create a new image with that template, you just replace the the text in the description. But. With Satori and that's hard, like you don't it's

Bilal Tahir (38:24.694)
It's just the image, you sound fine.

Pierson Marks (38:50.802)
Yeah, it's just tough to do because...

the formatting gets all off. It takes some time to go to Photoshop. You have to update the title and description and you have to export it. And I was having a lot of, I was just frustrated. was like, why do have to go to Photoshop every time I want to create a new little image for our website? And so I created this MCP server with Satori. And what that does is now in my LLM, I can just go, hey, use this template that is already saved in the server. Like use the social

Bilal Tahir (38:56.078)
Hmm.

Pierson Marks (39:24.138)
card template and put the title and description this and my chat GBT will just go like, OK, here you go. And you just click it and downloads. And it's like, I'm just living in chat GBT saying, hey, generate this really cool, generate this image with this title, this subtext, or put this logo in. And it's just like, works.

Bilal Tahir (39:41.912)
Right. I thought chat GPT didn't have MCP. Do they support that now? So you can like, yeah, raycast. that's nice.

Pierson Marks (39:47.136)
not ChatGPT, Raycast or Cursor. So using ChatGPT as an example when they support MCP. But it's super powerful because I was able to iterate so quickly. mean, it looks like there was times where I was creating these images and the description just didn't look weird. Like the spacing was kind of off.

Bilal Tahir (39:57.654)
Yeah, very powerful.

Pierson Marks (40:11.476)
But now I just have this MCP server that's live. Anybody can use it. I'll put the links in the show notes. And if you add it to your Claude code or your Claude desktop app, you could just use all these templates that we've already built. They look nice. And you say, hey, generate an image with this title and this description. And it just works. And you can just download it. It's great.

Bilal Tahir (40:26.701)
Right.

Bilal Tahir (40:34.422)
yeah, mean having an LLM in the middle I think definitely helps a lot especially, you know, and I've noticed this with anyone who's tried to do captions has noticed that you are surprised at how many edge cases.

can appear because a little word is too long or something and suddenly it goes off the screen or you have to wrap it. But then there's like three words at the top and there's one word and it doesn't look balanced at all. And you're like, and you know how to fix it. But if you try to come up with programmatic rules, it's very hard to account for every variation of a sentence. I wonder, and this is something that you sent me, I think I remember. One workflow I've seen is like people will develop an MCP

Pierson Marks (40:47.988)
Right.

Bilal Tahir (41:15.472)
pipeline where they're taking a screenshot of their web page or whatever, sending it back to Claude code or whatever, so that the LLM actually can see their design and iterate on it. And I wonder if there's probably a workflow here where you can generate it, but then send it back to the LLM so it kind of does a design review and be like, OK, now actually for this sentence, maybe I need to shorten it or something, or maybe make this font a little smaller.

Pierson Marks (41:36.128)
All right.

Totally. Yeah. I added the Playwright MCP into our code base. we can do that. It should be able to look at actually what it's doing. Because, I mean, we write code. just because you could read the code and write good code, you don't know what it actually looks like. Even though that you, hey, I wrote the code to look like this. But you're like, don't know what it looks like.

Bilal Tahir (41:59.98)
Right, Yeah. No, very popular.

Pierson Marks (42:02.826)
But yeah, so I'll link that thing in the show notes MCP server. So if you use Claude or whatever, you could use it and just create some images. It was very helpful for me. It's probably already saved me like an hour of time just creating these images. And there's some cool templates and open for contributions. So if you want to add template to that library, something that we don't have, come and do it.

Bilal Tahir (42:26.498)
Yeah, do it. there's also, if you just want to play around, there's I think a demo link there as well. You can just go generate SVGs and PNGs and stuff, just to see. I feel like that drives home the value of this tool. So, in a place.

Pierson Marks (42:41.854)
Right, right. Maybe the thumbnail of this video I'll generate different with that. Maybe, we'll see. I could do it. But yeah, so I think like honestly, yeah, we're like 40 minutes.

Bilal Tahir (42:47.751)
yeah, nice. That's pretty cool.

Bilal Tahir (42:56.717)
Yeah.

We're almost at time. Oh, I guess before to close off, let's do some good old fashioned gossip. So I found this, I thought this was hilarious. I don't know if you saw, but so Meta has been poaching a lot of AI engineers, know, that's like, know, Zach was like on a full blitz mode. He was like, all right, I'm to put down a hundred million dollar offers. I'm going to get the best people in because we're behind. Cause Lama 4 was a very disappointing launch and it kind of caused, you know, basically an emergency inside Meta leadership. And they were like, okay, we got to get our act together.

Pierson Marks (43:02.911)
Yes.

Bilal Tahir (43:27.216)
Alexander Wang from Scale AI was kind of hired to lead the team and initially they seemed like they were getting a lot of good talent from OpenAI, Anthropic, etc. But recently what happened last week was one engineer from OpenAI went to Meta and then within a week Wets goes back to OpenAI. Then two more, like yesterday, like one from OpenAI, one from Anthropic, go back.

So I don't know what's going on, but it's like, it reminds me of that Silicon Valley episode, if you've ever seen the show. But in the show, there's this billionaire Gavin, who's just trying to win at all costs. And he does the same strategy where Pied Piper, the startup, who the show is based on, they're doing a startup and he wants to compete against them. So what he does is he hires all these stars. And one of the guys he hires, he's the leader, he shows up, he says, all right, what do you have? And they show him. And then the next scene is like,

leaving basically running away and so i wondered if something like that happened like these guys were like promised like my god you're gonna build agi blah blah and then they show up and they're like fuck our goal is to just like you know create like stepmom avatars for you know facebook or whatever so they're like nope

Pierson Marks (44:39.044)
Yeah, went in a different direction that I thought you were going to go, I that. That is true. Cause I think it's very important that when you're hiring talent, well, first off, the people that went over there went for money reasons most of the times, which was just fair. think.

Bilal Tahir (44:50.99)
right. Which is why it's even surprising. It must have been really bad there for them to say, yeah, know, $100 million come to think of it. What's $100 million? Or maybe OpenAI finally match. don't know, there's probably, there probably was some negotiations and I'm sure they gave them, you know, something that was, you know, all right, you know, give me something, you know, I can't just say no to $100 million. Give me 20, you know, whatever.

Pierson Marks (44:58.75)
Yeah, because like the people there.

Pierson Marks (45:08.266)
Yeah.

Pierson Marks (45:12.096)
It will be interesting though because if OpenAI gives into that negotiating power.

Like they probably just would come back for the same salary because if they don't then everybody else is like, yo, these guys left. They went to Metta, they took that offer, then they come back a week later and they get a salary bump. Like that's kind of like messed up. So that'll be interesting cultural thing. I bet it's like what you said is that either one Metta was just super far behind in their approaches and they're just kind of chicken with their heads cut off and they're kind of in this limbo state. Maybe I have no idea. So take this with a grain of salt. I'm very, very bullish on Metta though.

Bilal Tahir (45:30.286)
Right, right, yeah.

Bilal Tahir (45:49.112)
Yeah, same.

Pierson Marks (45:49.14)
But it could also be that what they're doing and what they're trying to build is maybe not resonating as much with those researchers, like what you're saying, characters, avatars. I think there's a lot of opportunity in the advertising space. And I think that that's what MENA is going to make a trillion dollars with in the next decade or so, is advertising.

Bilal Tahir (46:09.666)
Right, mean, it's just one of those things you have to do. mean, this is their product, so totally makes sense. yeah. Yeah.

Pierson Marks (46:15.18)
Right. So it'll be interesting. you think this is going to continue next week? Do you think next week when we have this, there's going be more people left or is it like a little blip and we go, OK.

Bilal Tahir (46:22.775)
I mean...

I have no insight into what's going on in there. like you, would say I'm bullish, Mera. I just think they have their distribution network. I don't think you should bet against Zuck. I actually think he's an amazing CEO and leader. despite all the fun people make of him, I think he's always been very bold and made the right decision. That's why Mera is where it's today. People don't understand. He's been doing this. He bought Instagram for a billion dollars. When he did that, people were like, are you insane? That's an insane amount of money to pay for a 10

startup that just does filters. What the hell are you thinking? Just building it now. And you know, now it's called one of the steals of the century. Instagram alone is worth a hundred plus, 200 billion dollars. Same with WhatsApp. He bought it for 50 billion people. Like, are you insane? What are you doing? WhatsApp is probably, especially in the age of like privacy, know, and security encryption. It's like, you know, one of the most important networks in the world. he saw, and same with AI, people are saying, are you insane? Where are you spending hundreds of billions of dollars on this? And you know, I know his VR bet has

Pierson Marks (46:57.184)
All right.

Bilal Tahir (47:24.714)
paid off yet too but I think similar thing five ten years from now people are like yeah I'm glad he was at the helm and he had the executive authority to make those decisions so I think yeah he's always been very bold and you know the way he thinks of it I think is percentage of market cap he says you think of it as like a hundred billion dollars I think of it as one percent of the market cap of my company would I be willing to bet one percent of my market cap to get to AI for AGI first fuck yes so it's a it's a complete

Pierson Marks (47:51.968)
Sure. Yeah.

Bilal Tahir (47:54.586)
good bet. I think it's a very sound bet and I think it'll pay off for them because they have their distribution network. They don't need to be the leading lab, they just need to have a good enough AI and be able to roll it out in a way that adds more value to their network effects, which they won't.

Pierson Marks (48:10.976)
And it's like who are the users of Facebook and Instagram? It's consumers and the people that are using like Anthropic and OpenAI maybe like everyone's models are going to be able to code. They're going to have logical reasoning and things like that. That's just like a foundation. But for Meta, I mean, the difference between GPT-5 and GPT-6 in terms of like abilities, it's going to be so negligible for the vast majority of people that are communicating with it via Instagram. You're probably just talking to a character. You just want that character just to like not sound

weird. You're having a conversation and it's not going to need this like crazy AGI stuff and so maybe they're like hey we still want to be dependent on these other labs and then you have the network the distribution networks and right.

Bilal Tahir (48:41.859)
Right.

Bilal Tahir (48:53.251)
Yeah.

Yeah, I think so. think they're in a great position. And I think all the hyperscalers, all these OG companies are in great position. The other company will close off after this. do have some, I think, news to share? But as anyone who knows me knows, I am a huge Google Bull. And I remain a Google Bull. With Nano Banana, we talked about Nano Banana. hopefully Gemini 3 will come out soon. I think that's going to kill. And they're just killing it. They have the best image editing model now. They have Vio3.

hopefully Gemini I think will be like up their GPT-5 plus level. So yeah, I mean, I still think, I think most people still think Google is behind and they're losing the AI race and I don't think so. And if you look at search query, they've basically been able to get the cost of a search query, an AI search query down to almost a search query. Like it's basically almost there. So yeah, and the AI overviews, I think I love the AI overviews. So I think Google is in a great position. And then, and finally what I'll share, I can't share who told me this.

because it's insider knowledge, but apparently they are right now deploying VO4, which means they actually have trained it already, which means they're deploying it. I'm sorry, which means that, you know, that we will hopefully very soon see something a lot better than VO3, and VO3 is also amazing. So that was very surprising to me because I thought it would be a while before we saw VO4, but yeah, and maybe it will, maybe those keep...

Pierson Marks (49:56.35)
Hahaha

Pierson Marks (50:06.72)
Well,

Pierson Marks (50:20.224)
Oh, that's making news on Creative Flux. So you get the inside, you get the banana, you get all of it right here on Creative Flux.

Bilal Tahir (50:24.376)
There you go, yeah.

Bilal Tahir (50:32.354)
Yeah, so hopefully they don't like hide it for a while, but yeah, they're deploying.

Pierson Marks (50:37.672)
Nice, now that's awesome. on that note, episode 11 wrapped up and we'll see you all next week. Cool, bye.

Bilal Tahir (50:46.478)
All right, take care guys, bye.

More episodes

Chapters

What is Creative Flux?