swyx backup pod

https://www.listennotes.com/podcasts/the-logan-bartlett/ep-46-stability-ai-ceo-emad-8PQIYcR3r2i/


[00:00:00.000 --> 00:00:02.580]   (upbeat music)
[00:00:02.580 --> 00:00:07.560]   - Welcome to the 46th episode of Cartoon Avatars.
[00:00:07.560 --> 00:00:09.840]   I am your host Logan Bartlett.
[00:00:09.840 --> 00:00:11.960]   Welcome back for break.
[00:00:11.960 --> 00:00:13.280]   Thanks everyone for bearing with us
[00:00:13.280 --> 00:00:16.160]   as we took a pause over the last couple of weeks.
[00:00:16.160 --> 00:00:18.200]   We're excited for this episode.
[00:00:18.200 --> 00:00:19.880]   This, what you're gonna hear on this episode
[00:00:19.880 --> 00:00:22.480]   is a conversation that I had with the Mod in the Stock.
[00:00:22.480 --> 00:00:27.400]   And Mod is the founder and CEO of Stable Stability AI,
[00:00:27.400 --> 00:00:29.840]   which is the largest contributor to Stable Diffusion.
[00:00:29.840 --> 00:00:32.280]   Stable Diffusion is the fastest growing
[00:00:32.280 --> 00:00:34.120]   open source project of all time.
[00:00:34.120 --> 00:00:39.080]   It's one of the leading platforms in generative AI.
[00:00:39.080 --> 00:00:41.720]   And Ema and I had a really interesting conversation
[00:00:41.720 --> 00:00:43.080]   about a bunch of different things,
[00:00:43.080 --> 00:00:47.720]   but we dive into the state of artificial intelligence today,
[00:00:47.720 --> 00:00:50.920]   why this is possible, when it wasn't in the past,
[00:00:50.920 --> 00:00:53.160]   where this is going in the future,
[00:00:53.160 --> 00:00:56.760]   how he differentiates versus competitors like OpenAI.
[00:00:56.760 --> 00:00:59.560]   Really fun conversation and appreciate him
[00:00:59.560 --> 00:01:01.120]   for powering through.
[00:01:01.120 --> 00:01:02.760]   He was a little sick as we were doing this.
[00:01:02.760 --> 00:01:07.760]   So it was a fun conversation and I appreciate him doing it with me.
[00:01:07.760 --> 00:01:09.840]   And so before you hear that,
[00:01:09.840 --> 00:01:11.560]   we talked a little bit about this before break,
[00:01:11.560 --> 00:01:13.240]   but we are gonna make a more concerted effort
[00:01:13.240 --> 00:01:16.440]   to get people to like and subscribe
[00:01:16.440 --> 00:01:20.240]   and share and review the podcast itself.
[00:01:20.240 --> 00:01:22.560]   And so if you're whatever platform you're listening on,
[00:01:22.560 --> 00:01:25.720]   if it's YouTube, if it's Spotify, if it's Apple,
[00:01:25.720 --> 00:01:29.040]   whatever it is, if people could go ahead and like
[00:01:29.040 --> 00:01:31.640]   and subscribe and leave a review,
[00:01:31.640 --> 00:01:33.880]   share with a friend, all of that stuff.
[00:01:33.880 --> 00:01:35.480]   We're trying to figure out exactly what direction
[00:01:35.480 --> 00:01:36.640]   to take this in.
[00:01:36.640 --> 00:01:40.080]   And so that validation and feedback
[00:01:40.080 --> 00:01:42.520]   and also the growth that comes along with all that stuff
[00:01:42.520 --> 00:01:44.880]   is super appreciated.
[00:01:44.880 --> 00:01:46.880]   It's not something we had been comfortable
[00:01:46.880 --> 00:01:48.240]   asking for to date,
[00:01:48.240 --> 00:01:51.080]   but as we kind of figure out what direction we're gonna go,
[00:01:51.080 --> 00:01:54.840]   we'd love to see more shares, more reviews, more views,
[00:01:54.840 --> 00:01:56.200]   more likes, all that stuff.
[00:01:56.200 --> 00:02:00.520]   So really appreciate everyone's support in doing that.
[00:02:00.520 --> 00:02:02.520]   And so without further delay,
[00:02:02.520 --> 00:02:04.280]   what you're gonna hear now is the conversation with me
[00:02:04.280 --> 00:02:06.880]   and I'm on Mistock from Stability AI.
[00:02:06.880 --> 00:02:09.920]   All right, Iman Mistock.
[00:02:09.920 --> 00:02:10.800]   Did I say that right?
[00:02:10.800 --> 00:02:12.120]   - Yep. - Perfect.
[00:02:12.120 --> 00:02:13.680]   Thank you for doing this.
[00:02:13.680 --> 00:02:16.560]   Founder of Stability AI,
[00:02:16.560 --> 00:02:20.480]   one of the main contributors to stable diffusion.
[00:02:20.480 --> 00:02:23.480]   Thank you for coming on here today.
[00:02:23.480 --> 00:02:24.440]   - So pleasure, Logan.
[00:02:24.440 --> 00:02:25.960]   Most I have be here.
[00:02:25.960 --> 00:02:26.800]   - Yeah, totally.
[00:02:26.800 --> 00:02:30.120]   So maybe at a highest level, we can start off with
[00:02:30.120 --> 00:02:32.560]   what is generative AI?
[00:02:32.560 --> 00:02:34.760]   How would you define that for the average person?
[00:02:34.760 --> 00:02:37.880]   - So I think everyone said of kind of the concepts
[00:02:37.880 --> 00:02:40.560]   of big data 'cause the whole of the internet previously
[00:02:40.560 --> 00:02:41.680]   was on big data.
[00:02:41.680 --> 00:02:44.640]   Large, large models built by Google and Facebook
[00:02:44.640 --> 00:02:47.000]   and others to basically target you ads
[00:02:47.000 --> 00:02:49.200]   ads were the main part of that.
[00:02:49.200 --> 00:02:51.640]   And these models extended.
[00:02:51.640 --> 00:02:54.480]   So how to generalize model of what a person was like
[00:02:54.480 --> 00:02:56.600]   and then your specific interests,
[00:02:56.600 --> 00:03:00.480]   like EMAD likes green hoodies or Logan lights black jumpers.
[00:03:00.480 --> 00:03:05.520]   That then extended the previous to what the next thing was.
[00:03:05.520 --> 00:03:08.080]   They're like extension models inferring what was there.
[00:03:08.080 --> 00:03:09.560]   Gerative models are a bit different
[00:03:09.560 --> 00:03:12.160]   in that they learn principles from structured
[00:03:12.160 --> 00:03:16.080]   and unstructured data and then they can generate new things
[00:03:16.080 --> 00:03:17.400]   based on those principles.
[00:03:17.400 --> 00:03:21.680]   So you could ask it to write an essay about bubble sort
[00:03:21.680 --> 00:03:24.840]   or a solid about Shakespeare or the TTH
[00:03:24.840 --> 00:03:26.840]   which is digital so you can do that.
[00:03:26.840 --> 00:03:29.960]   Or in the case of some of the work that we're most famous for
[00:03:29.960 --> 00:03:32.800]   you enter in a labradoodle with a hat
[00:03:32.800 --> 00:03:34.960]   and a stained glass window and it understands that
[00:03:34.960 --> 00:03:37.200]   and then creates that in a few seconds.
[00:03:37.200 --> 00:03:39.360]   So I'd say that's probably the biggest difference
[00:03:39.360 --> 00:03:40.880]   between this new type of generative AI
[00:03:40.880 --> 00:03:43.040]   and then that old type of AI.
[00:03:43.040 --> 00:03:44.880]   So the way that I also say that we've moved
[00:03:44.880 --> 00:03:47.720]   from a big data area to more a big model era
[00:03:47.720 --> 00:03:50.600]   'cause these models are very difficult to create, train
[00:03:50.600 --> 00:03:53.480]   which is why only a few companies such as ours do it.
[00:03:53.480 --> 00:03:57.800]   - And the key point there is the predictive nature
[00:03:57.800 --> 00:04:01.560]   of these models and the ability to actually take
[00:04:01.560 --> 00:04:05.760]   not just what was given to it but also act on all
[00:04:05.760 --> 00:04:08.840]   the other things to sort of self-create in that way.
[00:04:08.840 --> 00:04:11.280]   Or what was the distinction that you were drawing there?
[00:04:11.280 --> 00:04:16.280]   - Yeah, so like if you built a dog classifier previously
[00:04:16.280 --> 00:04:19.840]   and then a new type of dog species came along
[00:04:20.720 --> 00:04:23.440]   then the classifier wouldn't be able to understand it
[00:04:23.440 --> 00:04:25.480]   'cause I didn't really understand the concept of a dog.
[00:04:25.480 --> 00:04:28.600]   - It was just responding to all the history of things
[00:04:28.600 --> 00:04:31.240]   that it had been fed in terms of, I guess,
[00:04:31.240 --> 00:04:33.280]   fed is an interesting analogy for a dog here
[00:04:33.280 --> 00:04:35.880]   but all the history of data, they didn't have any
[00:04:35.880 --> 00:04:37.880]   understanding that it was a dog, it was just, hey,
[00:04:37.880 --> 00:04:41.320]   here's all the parameters around which this thing
[00:04:41.320 --> 00:04:42.160]   seems defined.
[00:04:42.160 --> 00:04:43.800]   - Is dog-like, yeah.
[00:04:43.800 --> 00:04:46.320]   You know, this is one of the issues with self-driving cars.
[00:04:46.320 --> 00:04:49.200]   Like you have this whole world of things that you're used to
[00:04:49.200 --> 00:04:50.720]   but then what happens if something happens
[00:04:50.720 --> 00:04:52.880]   that is not in the training set, right?
[00:04:52.880 --> 00:04:55.360]   So in 2017 there was a breakthrough
[00:04:55.360 --> 00:04:57.000]   in what's known as deep learning,
[00:04:57.000 --> 00:04:58.800]   whether it's just paper attention as all you need
[00:04:58.800 --> 00:05:00.680]   about how to get an AI to pay attention
[00:05:00.680 --> 00:05:03.920]   to the important things as opposed to just everything.
[00:05:03.920 --> 00:05:05.880]   So we moved from just analyzing everything
[00:05:05.880 --> 00:05:08.200]   to analyzing the important things.
[00:05:08.200 --> 00:05:10.080]   And this has what's led to a lot of the transformative
[00:05:10.080 --> 00:05:13.800]   breakthroughs that has allowed AI to get in very narrow areas
[00:05:13.800 --> 00:05:16.280]   to human levels of performance in writing, reading,
[00:05:16.280 --> 00:05:19.440]   playing Go, playing StarCraft, all sorts of things,
[00:05:19.440 --> 00:05:21.560]   protein folding, et cetera as well.
[00:05:21.560 --> 00:05:24.600]   Another breakthroughs that are not human as it were.
[00:05:24.600 --> 00:05:29.000]   - And so it feels to me and probably the average person
[00:05:29.000 --> 00:05:31.560]   that this all came out of nowhere
[00:05:31.560 --> 00:05:35.840]   but we've had sort of incremental progress in AI
[00:05:35.840 --> 00:05:38.360]   over the course of the last 20 years or so.
[00:05:38.360 --> 00:05:41.760]   Can you give a quick primer on like back to the days
[00:05:41.760 --> 00:05:46.360]   of Deep Blue and Chas and Deep Mind and Go?
[00:05:46.360 --> 00:05:48.680]   Like what have been in your mind
[00:05:48.680 --> 00:05:51.160]   the historical points along the way
[00:05:51.160 --> 00:05:53.320]   that sort of led to this avalanche
[00:05:53.320 --> 00:05:55.800]   that feels like it just has just happened?
[00:05:55.800 --> 00:05:56.880]   - Yeah, so like machine learning
[00:05:56.880 --> 00:05:58.320]   was kind of the classical paradigm.
[00:05:58.320 --> 00:05:59.480]   Actually one way to think about it as well
[00:05:59.480 --> 00:06:00.680]   is that you got two parts of your brain,
[00:06:00.680 --> 00:06:02.240]   the part of your brain that jumps to conclusions
[00:06:02.240 --> 00:06:03.280]   and the logical part.
[00:06:03.280 --> 00:06:05.080]   So the world as it is and there's fairly crap
[00:06:05.080 --> 00:06:06.880]   there's a tiger in the bush, right?
[00:06:06.880 --> 00:06:10.160]   So classical AI was the more logical kind of way
[00:06:10.160 --> 00:06:12.520]   and it was based on more and more and more data,
[00:06:12.520 --> 00:06:13.680]   again big data.
[00:06:13.680 --> 00:06:16.240]   So when Deep Blue beat Gary Kasparov
[00:06:16.240 --> 00:06:18.440]   it's because it could think more moves ahead of him.
[00:06:18.440 --> 00:06:22.120]   It just did pure crunching of the numbers.
[00:06:22.120 --> 00:06:25.000]   It looked at every chess match and then it outperformed him.
[00:06:25.000 --> 00:06:27.200]   And eventually people knew it would get to that point
[00:06:27.200 --> 00:06:29.240]   but they didn't think it happened quite so quickly.
[00:06:29.240 --> 00:06:32.440]   - And that was in 1996, '97-ish
[00:06:32.440 --> 00:06:35.920]   and basically chess is kind of a constrained game
[00:06:35.920 --> 00:06:36.760]   in a lot of ways.
[00:06:36.760 --> 00:06:39.000]   Like there's only so many moves that can be made
[00:06:39.000 --> 00:06:41.440]   and so you can copy occasionally run through the history
[00:06:41.440 --> 00:06:42.960]   of all of the moves and figure out
[00:06:42.960 --> 00:06:44.680]   what the next best action is.
[00:06:44.680 --> 00:06:46.880]   - And then also look at all of the previous moves.
[00:06:46.880 --> 00:06:50.080]   So Gary Kasparov I believe could think five, six moves ahead
[00:06:50.080 --> 00:06:52.640]   but then Deep Blue could think seven, eight moves ahead
[00:06:52.640 --> 00:06:54.840]   and it was just giant supercomputer literally,
[00:06:54.840 --> 00:06:56.600]   you know, that kind of beat him.
[00:06:56.600 --> 00:06:58.960]   And I was like, okay, that's the case.
[00:06:58.960 --> 00:07:02.280]   Then humans started playing with computers
[00:07:02.280 --> 00:07:03.920]   and computers started playing with humans
[00:07:03.920 --> 00:07:07.000]   and now the best chess players are humans working
[00:07:07.000 --> 00:07:08.840]   with computers which is very interesting.
[00:07:08.840 --> 00:07:11.600]   But it was again this very defined space.
[00:07:11.600 --> 00:07:16.600]   By contrast Go was a game that people thought
[00:07:16.600 --> 00:07:18.520]   couldn't be beaten by this mechanism
[00:07:18.520 --> 00:07:20.640]   because a Go board Chinese chess
[00:07:20.640 --> 00:07:24.320]   has too many computational possibilities.
[00:07:24.320 --> 00:07:26.520]   So you can't think X moves ahead
[00:07:26.520 --> 00:07:29.360]   because you just get exponentially more compute required.
[00:07:29.360 --> 00:07:33.600]   So DeepMind, a research lab out of London
[00:07:33.600 --> 00:07:37.240]   now owned by Google built a system called AlphaGo
[00:07:37.240 --> 00:07:40.600]   that created a computer that again, learned principles
[00:07:40.600 --> 00:07:42.520]   and actually played against itself.
[00:07:42.520 --> 00:07:44.200]   So they didn't even look at historical games
[00:07:44.200 --> 00:07:45.600]   for the later versions.
[00:07:45.600 --> 00:07:48.440]   They pitted against Lisa Dahl who was,
[00:07:48.440 --> 00:07:49.960]   I think it was seventh on ninth Dan.
[00:07:49.960 --> 00:07:52.840]   He was basically like the Magnus Carlsen of Go.
[00:07:52.840 --> 00:07:54.640]   So far ahead of everyone and everyone's like,
[00:07:54.640 --> 00:07:56.360]   it has no way they can beat him.
[00:07:56.360 --> 00:07:59.000]   It drew with him once and beat him like another seven times
[00:07:59.000 --> 00:08:01.120]   and I was like, wait, what?
[00:08:01.120 --> 00:08:03.120]   Like I said, without doing that massive levels
[00:08:03.120 --> 00:08:04.440]   of number crunching 'cause it learned
[00:08:04.440 --> 00:08:06.160]   what was important in moves
[00:08:06.160 --> 00:08:07.920]   and the principles to do moves.
[00:08:07.920 --> 00:08:10.720]   - Yeah, I was 2016 when this happened.
[00:08:10.720 --> 00:08:13.240]   - Exactly, that was kind of with the reinforcement
[00:08:13.240 --> 00:08:15.920]   self-supervised learning which was one component of this
[00:08:15.920 --> 00:08:17.720]   before the deep learning came which were
[00:08:17.720 --> 00:08:20.720]   and transformer based attention learning which was 2017
[00:08:20.720 --> 00:08:22.400]   which is the next step above that.
[00:08:22.400 --> 00:08:25.000]   So there's a few things all happening at the same time
[00:08:25.000 --> 00:08:28.040]   along with an exponential again, a mathematician.
[00:08:28.040 --> 00:08:29.640]   So these exponentials likely literally,
[00:08:29.640 --> 00:08:31.440]   a lot of these things look like exponentials
[00:08:31.440 --> 00:08:32.880]   'cause they are exponentials
[00:08:32.880 --> 00:08:35.320]   increasing compute availability.
[00:08:35.320 --> 00:08:39.200]   So what happened there is that then it was very interesting.
[00:08:39.200 --> 00:08:42.000]   The, since that point where everyone's like,
[00:08:42.000 --> 00:08:44.880]   holy crap, he's got beat and his level has gone up and go
[00:08:44.880 --> 00:08:46.040]   but then so's everyone else.
[00:08:46.040 --> 00:08:48.480]   So if you look at the average level of Go players,
[00:08:48.480 --> 00:08:50.200]   it's been like this for about three decades
[00:08:50.200 --> 00:08:53.040]   and now it does that because the computer could think
[00:08:53.040 --> 00:08:55.040]   in brand new ways and think about new principal
[00:08:55.040 --> 00:08:56.920]   both ways to do it but now humans computers
[00:08:56.920 --> 00:08:58.160]   got even better.
[00:08:58.160 --> 00:09:01.000]   Then there was the transformer based architecture paper.
[00:09:01.000 --> 00:09:01.920]   I'm skipping over a lot.
[00:09:01.920 --> 00:09:04.640]   There's a lot of stuff happening in deep learning
[00:09:04.640 --> 00:09:07.040]   and it was this attention based system whereby it paid
[00:09:07.040 --> 00:09:10.280]   attention to the most important parts of a given data set
[00:09:10.280 --> 00:09:14.360]   that led to breakthroughs like GPT-3 in 2020.
[00:09:14.360 --> 00:09:18.600]   GPT-3 is a model by OpenAI which is a research lab
[00:09:18.600 --> 00:09:21.200]   primarily backed by Microsoft focused on
[00:09:21.200 --> 00:09:22.520]   artificial general intelligence.
[00:09:22.520 --> 00:09:25.520]   So how do you make an AI that can do just about anything?
[00:09:25.520 --> 00:09:28.920]   That could write like a human.
[00:09:28.920 --> 00:09:31.000]   So you give it like a lassen gimley
[00:09:31.000 --> 00:09:33.040]   and it'll write your whole story in the style
[00:09:33.040 --> 00:09:34.600]   of one of the rings.
[00:09:34.600 --> 00:09:36.400]   But what it does is basically it guessed
[00:09:36.400 --> 00:09:38.560]   what the next word in a sentence is
[00:09:38.560 --> 00:09:40.920]   from a giant corpus of text,
[00:09:40.920 --> 00:09:42.840]   actually not big, a few terabytes,
[00:09:42.840 --> 00:09:44.200]   a few thousand gigabytes,
[00:09:44.200 --> 00:09:47.360]   that was then run on a gigantic supercomputer.
[00:09:47.360 --> 00:09:50.560]   So supercomputers kind of had a linear increase
[00:09:50.560 --> 00:09:53.080]   in their capabilities over the years.
[00:09:53.080 --> 00:09:55.360]   And you see things like the Apollo landing
[00:09:55.360 --> 00:09:59.240]   is like same compute as your iPhone, right?
[00:09:59.240 --> 00:10:00.840]   But that was still quite linear.
[00:10:00.840 --> 00:10:03.880]   Over the last few years led by kind of Nvidia
[00:10:03.880 --> 00:10:05.480]   and these GPU moments,
[00:10:05.480 --> 00:10:08.400]   you've had an exponential increase in supercomputer.
[00:10:08.400 --> 00:10:10.560]   And these models led themselves to,
[00:10:10.560 --> 00:10:13.560]   you take a relatively small amount of data
[00:10:13.560 --> 00:10:17.200]   and like text writings of the whole of archive
[00:10:17.200 --> 00:10:20.480]   or PubMed or a scrape of the internet
[00:10:20.480 --> 00:10:25.240]   or like a billion images with captions.
[00:10:25.240 --> 00:10:26.920]   And then you put it into the supercomputer
[00:10:26.920 --> 00:10:28.920]   and the supercomputer looks at the connections
[00:10:28.920 --> 00:10:31.880]   between the words and the images or the words in a sentence
[00:10:31.880 --> 00:10:33.000]   and how they line up
[00:10:33.000 --> 00:10:35.360]   to figure out what should come next.
[00:10:35.360 --> 00:10:37.000]   So this was the big breakthrough in that
[00:10:37.000 --> 00:10:39.680]   you didn't actually have to build a custom algorithm
[00:10:39.680 --> 00:10:41.440]   for everything anymore.
[00:10:41.440 --> 00:10:43.400]   There was one set of algorithms
[00:10:43.400 --> 00:10:46.960]   like you had to do a good level of customization,
[00:10:46.960 --> 00:10:49.200]   but the key edge and key differential
[00:10:49.200 --> 00:10:51.160]   was no longer how big is your data set,
[00:10:51.160 --> 00:10:54.280]   you know, or seeing how customers use the data set,
[00:10:54.280 --> 00:10:56.400]   it was just how much compute do you have.
[00:10:56.400 --> 00:10:58.520]   So more and more compute was applied to these models
[00:10:58.520 --> 00:10:59.480]   and then they just broke through
[00:10:59.480 --> 00:11:01.520]   and they got bigger and bigger and bigger.
[00:11:01.520 --> 00:11:06.520]   So like GPT-3 was 167 billion parameter model.
[00:11:06.520 --> 00:11:10.600]   That's the kind of, you say it's the kind of things
[00:11:10.600 --> 00:11:11.520]   that it knows.
[00:11:11.520 --> 00:11:15.840]   And then it got to 500 billion and then bigger and bigger.
[00:11:15.840 --> 00:11:18.320]   These large language models as they were called
[00:11:18.320 --> 00:11:22.600]   that could be human level in answering questions, you know?
[00:11:22.600 --> 00:11:24.760]   And this technology started to proliferate
[00:11:24.760 --> 00:11:26.720]   because when you get to human level it was great,
[00:11:26.720 --> 00:11:30.880]   but it didn't proliferate that fast because it was slow
[00:11:30.880 --> 00:11:32.200]   and it was expensive.
[00:11:32.200 --> 00:11:33.920]   And it required a lot of technical expertise
[00:11:33.920 --> 00:11:36.600]   to even run these models, let alone create them.
[00:11:36.600 --> 00:11:39.360]   And again, like the super compute levels
[00:11:39.360 --> 00:11:43.600]   are just beyond belief and try to get to that.
[00:11:43.600 --> 00:11:46.200]   At the start of last year on the image side,
[00:11:46.200 --> 00:11:49.360]   which is one of the areas that we're focused,
[00:11:49.360 --> 00:11:52.440]   the Open Hour released something interesting called clip,
[00:11:52.440 --> 00:11:54.600]   which was an image to text model.
[00:11:54.600 --> 00:11:58.040]   So you could also generate text descriptions of images.
[00:11:58.040 --> 00:12:00.840]   There were some generative models before that.
[00:12:00.840 --> 00:12:02.320]   And so you had a generative model
[00:12:02.320 --> 00:12:04.040]   and then you had a model that could tell you
[00:12:04.040 --> 00:12:05.560]   what a generation was.
[00:12:05.560 --> 00:12:08.880]   And so a bunch of groups came together and said,
[00:12:08.880 --> 00:12:10.640]   "What if you bounce them off each other?
[00:12:10.640 --> 00:12:12.000]   What that tells you what an image is
[00:12:12.000 --> 00:12:14.800]   and what it tells you how to generate an image?
[00:12:14.800 --> 00:12:16.600]   You could converge to better images."
[00:12:16.600 --> 00:12:18.640]   And that's what kicked off this whole image revolution.
[00:12:18.640 --> 00:12:21.280]   - Your conversion is from language.
[00:12:21.280 --> 00:12:24.320]   So just text based or speech based or whatever,
[00:12:24.320 --> 00:12:25.720]   into images.
[00:12:25.720 --> 00:12:28.880]   You're bringing two different modalities together.
[00:12:28.880 --> 00:12:30.640]   - Yeah, so the two models bounced off each other.
[00:12:30.640 --> 00:12:33.320]   So it'd be like a dog in a stained glass window
[00:12:33.320 --> 00:12:34.440]   and it produced a version.
[00:12:34.440 --> 00:12:37.120]   And then the image to text model would be like,
[00:12:37.120 --> 00:12:38.640]   "Ah, that's not that good."
[00:12:38.640 --> 00:12:41.840]   It looks like that, but then there was the other prompt
[00:12:41.840 --> 00:12:43.240]   and it made adjustments.
[00:12:43.240 --> 00:12:45.520]   Then it went back and forth, back and forth, back and forth.
[00:12:45.520 --> 00:12:47.160]   So you got something that looked a little bit
[00:12:47.160 --> 00:12:49.600]   like a dog in a stained glass window.
[00:12:49.600 --> 00:12:52.400]   And then teams around the world,
[00:12:52.400 --> 00:12:54.400]   led by a lot of people at Stability
[00:12:54.400 --> 00:12:57.800]   and OpenAI and Meta and some other places,
[00:12:57.800 --> 00:13:00.400]   started to think, "How can we really crack through this?"
[00:13:00.400 --> 00:13:02.840]   And it just went a bit crazy to get to the point now
[00:13:02.840 --> 00:13:04.880]   that you can generate a photorealistic image
[00:13:04.880 --> 00:13:06.440]   of anything in about a second.
[00:13:06.440 --> 00:13:09.520]   And again, this is part of the exponentials.
[00:13:09.520 --> 00:13:11.760]   Like the amount of compute that we're using now
[00:13:11.760 --> 00:13:14.360]   as a private company that's 14 months old,
[00:13:14.360 --> 00:13:17.160]   is 10 times the compute of NASA, but together.
[00:13:17.160 --> 00:13:20.760]   Or 10 times the compute of the fastest supercomputer in the UK.
[00:13:20.760 --> 00:13:22.200]   It would have been the fastest supercomputer
[00:13:22.200 --> 00:13:24.480]   in the world just 60 years ago.
[00:13:24.480 --> 00:13:26.560]   So for a private company to be able to access that
[00:13:26.560 --> 00:13:27.640]   is a bit insane.
[00:13:27.640 --> 00:13:30.520]   And how did you, you have a little bit of an unusual past
[00:13:30.520 --> 00:13:34.000]   into this or path into this?
[00:13:34.000 --> 00:13:37.480]   So can you talk through like how you actually ended up
[00:13:37.480 --> 00:13:39.480]   at the forefront of a lot of these things?
[00:13:39.480 --> 00:13:42.000]   - Oh, so yeah, I was quite lucky through life.
[00:13:42.000 --> 00:13:44.920]   I was a hedge fund manager, I was a video game investor.
[00:13:44.920 --> 00:13:47.720]   I took a break when my son was diagnosed with autism.
[00:13:47.720 --> 00:13:49.800]   And I realized that AI could be used
[00:13:49.800 --> 00:13:51.520]   to try and solve some of these things.
[00:13:51.520 --> 00:13:53.480]   This is the old school AI,
[00:13:53.480 --> 00:13:55.960]   because for people who know about autism,
[00:13:55.960 --> 00:13:57.360]   what's in spectrum disorder,
[00:13:57.360 --> 00:13:59.520]   there's no official treatment or cure,
[00:13:59.520 --> 00:14:01.800]   or nobody knows actually what causes it.
[00:14:01.800 --> 00:14:04.440]   And so I was like, what if we did an electroanalysis
[00:14:04.440 --> 00:14:06.520]   of all the different things that people think
[00:14:06.520 --> 00:14:07.800]   that cause it and try and figure out
[00:14:07.800 --> 00:14:10.560]   some commonalities with AI?
[00:14:10.560 --> 00:14:13.120]   And then identified some things in the brain,
[00:14:13.120 --> 00:14:14.360]   GABA and glutamate balance.
[00:14:14.360 --> 00:14:15.560]   So GABA calms you down,
[00:14:15.560 --> 00:14:16.960]   and glutamate makes you excited.
[00:14:16.960 --> 00:14:20.280]   So when you need to take Valium, your GABA goes up.
[00:14:20.280 --> 00:14:21.400]   When your brain is too excited,
[00:14:21.400 --> 00:14:23.080]   it's like when you're tapping your leg
[00:14:23.080 --> 00:14:26.120]   and you can't focus and pay attention to things, right?
[00:14:26.120 --> 00:14:29.880]   And so kids with ASD are often like that,
[00:14:29.880 --> 00:14:32.000]   in that they can't pay attention to form links
[00:14:32.000 --> 00:14:34.720]   between words and images and concepts,
[00:14:34.720 --> 00:14:37.680]   actually very similar to these diffusion-based image models.
[00:14:37.680 --> 00:14:39.480]   So a cup can mean cup your hands,
[00:14:39.480 --> 00:14:41.600]   a cup that you've got like that, a World Cup,
[00:14:41.600 --> 00:14:42.960]   you know, maybe Argentina or France,
[00:14:42.960 --> 00:14:45.720]   or when it who knows, recording just before that.
[00:14:45.720 --> 00:14:47.120]   So you need to calm down the brain somewhere.
[00:14:47.120 --> 00:14:47.960]   But when I looked at it,
[00:14:47.960 --> 00:14:50.640]   like there were 18 different things that led to that,
[00:14:50.640 --> 00:14:52.400]   potentially, and certain treatments
[00:14:52.400 --> 00:14:54.160]   that make some kids worse, some kids better.
[00:14:54.160 --> 00:14:55.520]   So we did a lot of drug repurposing
[00:14:55.520 --> 00:14:56.640]   on the end of one,
[00:14:56.640 --> 00:14:58.200]   and then I was advising governments
[00:14:58.200 --> 00:14:59.720]   and things at the time about AI
[00:14:59.720 --> 00:15:02.200]   and all sorts of other topics.
[00:15:02.200 --> 00:15:03.760]   I was like, this is really powerful technology,
[00:15:03.760 --> 00:15:05.040]   but, you know, I'm not a doctor,
[00:15:05.040 --> 00:15:08.480]   so I did my best to tell other people about it, but it's okay.
[00:15:08.480 --> 00:15:09.520]   But then about a few years ago,
[00:15:09.520 --> 00:15:11.000]   I realized that actually this technology
[00:15:11.000 --> 00:15:12.480]   could change the world.
[00:15:12.480 --> 00:15:15.440]   So we used it first in education in the small set
[00:15:15.440 --> 00:15:16.960]   and refugee camps around the world,
[00:15:16.960 --> 00:15:18.440]   or the charity my co-founder
[00:15:18.440 --> 00:15:19.720]   around to imagine worldwide.
[00:15:19.720 --> 00:15:21.920]   And that's going massive,
[00:15:21.920 --> 00:15:23.560]   and there'll be announcements next year.
[00:15:23.560 --> 00:15:26.800]   And then working on the United Nations AI response
[00:15:26.800 --> 00:15:28.080]   on COVID-19 as well,
[00:15:28.080 --> 00:15:28.920]   because again, it's this thing
[00:15:28.920 --> 00:15:30.240]   where it's multi-systemic condition,
[00:15:30.240 --> 00:15:31.560]   no one knew it was causing it,
[00:15:31.560 --> 00:15:33.720]   and that knowledge needed to be organized.
[00:15:33.720 --> 00:15:35.600]   Had loads of bureaucracy through that,
[00:15:35.600 --> 00:15:38.240]   lots of companies promising stuff that wouldn't deliver,
[00:15:38.240 --> 00:15:40.800]   really got into this sector and realized,
[00:15:40.800 --> 00:15:42.800]   this AI is possibly the most powerful thing
[00:15:42.800 --> 00:15:43.640]   we've ever seen,
[00:15:43.640 --> 00:15:45.960]   because human level means a lot, right?
[00:15:45.960 --> 00:15:47.840]   And the only people that could build that
[00:15:47.840 --> 00:15:50.560]   are, was basically the big tech companies
[00:15:50.560 --> 00:15:52.600]   plus OpenAI and a couple of others.
[00:15:53.600 --> 00:15:55.680]   And none of them wanted to release it open,
[00:15:55.680 --> 00:15:58.320]   because it is powerful.
[00:15:58.320 --> 00:16:00.080]   And powerful means also dangerous, right?
[00:16:00.080 --> 00:16:01.400]   There's always an upside-end downside.
[00:16:01.400 --> 00:16:03.280]   I don't believe technology is neutral.
[00:16:03.280 --> 00:16:05.080]   But the way they were doing it,
[00:16:05.080 --> 00:16:06.240]   it would never be released.
[00:16:06.240 --> 00:16:08.320]   So it would only be available to a select few,
[00:16:08.320 --> 00:16:11.080]   and the select few could create any image in seconds,
[00:16:11.080 --> 00:16:12.840]   or write an entire story.
[00:16:12.840 --> 00:16:14.200]   Where would it ever go to India,
[00:16:14.200 --> 00:16:16.120]   or Africa, or places like that?
[00:16:16.120 --> 00:16:17.280]   So that's why I thought,
[00:16:17.280 --> 00:16:20.080]   this is infrastructure just as important as the internet,
[00:16:20.080 --> 00:16:22.600]   for the next step in kind of human ability,
[00:16:22.600 --> 00:16:23.600]   and it should be open source.
[00:16:23.600 --> 00:16:26.280]   And then also, it's a better business model as well,
[00:16:26.280 --> 00:16:28.320]   putting on my hedge fund manager app.
[00:16:28.320 --> 00:16:30.320]   'Cause all of our vital infrastructure
[00:16:30.320 --> 00:16:32.920]   for the internet service, databases, DevOps,
[00:16:32.920 --> 00:16:34.840]   it's all turned open source now.
[00:16:34.840 --> 00:16:36.760]   And the simple business model is scale and service.
[00:16:36.760 --> 00:16:38.600]   People come to you when they wanna scale it,
[00:16:38.600 --> 00:16:39.800]   well, customer versions of it.
[00:16:39.800 --> 00:16:41.880]   And I thought that's the winning business strategy.
[00:16:41.880 --> 00:16:43.160]   So that's how I start stability
[00:16:43.160 --> 00:16:44.840]   as a mission-based organization,
[00:16:44.840 --> 00:16:47.040]   with a profit-based focus.
[00:16:47.040 --> 00:16:48.680]   But the profit is making these models
[00:16:48.680 --> 00:16:50.120]   available to everyone,
[00:16:50.120 --> 00:16:54.440]   and customizing them and scaling them for everyone as well.
[00:16:54.440 --> 00:16:55.320]   It has been interesting,
[00:16:55.320 --> 00:16:58.240]   as not Silicon Valley native, or anything like that,
[00:16:58.240 --> 00:16:59.440]   and talking to people about
[00:16:59.440 --> 00:17:02.240]   why we should let people have access to this technology,
[00:17:02.240 --> 00:17:04.320]   and people are generally good, not bad.
[00:17:04.320 --> 00:17:07.960]   But yeah, it's been quite a ride.
[00:17:07.960 --> 00:17:09.280]   - I wanna play that back to you a little bit,
[00:17:09.280 --> 00:17:12.280]   because it's how you actually got into it,
[00:17:12.280 --> 00:17:14.760]   is such an interesting part of this.
[00:17:14.760 --> 00:17:18.160]   So you were working at a hedge fund,
[00:17:18.160 --> 00:17:21.760]   and you took a break because of your son's autism disorder,
[00:17:21.760 --> 00:17:24.400]   and you were able to do a bunch of different stuff
[00:17:24.400 --> 00:17:26.360]   with traditional AI to kind of figure out
[00:17:26.360 --> 00:17:28.800]   how to make sense of the different drugs,
[00:17:28.800 --> 00:17:30.640]   and the causes, and all that stuff.
[00:17:30.640 --> 00:17:33.480]   And then, did you see some people,
[00:17:33.480 --> 00:17:37.320]   was it open AI, and what they were working on,
[00:17:37.320 --> 00:17:38.240]   and you saw that,
[00:17:38.240 --> 00:17:40.080]   and how did the actual stable diffusion,
[00:17:40.080 --> 00:17:42.920]   and the involvement, and all of that stuff come to be?
[00:17:42.920 --> 00:17:44.840]   Because it sounds like you were around the industry,
[00:17:44.840 --> 00:17:47.000]   but did you meet someone and say,
[00:17:47.000 --> 00:17:49.200]   "Were someone already working on stable diffusion,
[00:17:49.200 --> 00:17:50.640]   the open source project?"
[00:17:50.640 --> 00:17:53.840]   - So I got involved about two and a half years ago,
[00:17:53.840 --> 00:17:56.080]   in Luther AI, as part of the community,
[00:17:56.080 --> 00:17:56.920]   where we were like,
[00:17:56.920 --> 00:17:58.960]   let's build an open source version of GPT-3,
[00:17:58.960 --> 00:18:01.480]   'cause the open AI stopped releasing stuff
[00:18:01.480 --> 00:18:04.280]   after their investment from Microsoft in 2019,
[00:18:04.280 --> 00:18:05.920]   'cause that it was too dangerous,
[00:18:05.920 --> 00:18:07.080]   which is ironic,
[00:18:07.080 --> 00:18:09.800]   given the original founding statement of open AI,
[00:18:09.800 --> 00:18:12.360]   but again, it's their prerogative, right?
[00:18:12.360 --> 00:18:13.840]   'Cause again, technology that's powerful,
[00:18:13.840 --> 00:18:17.640]   can be considered dangerous, and you had to talk with your dad.
[00:18:17.640 --> 00:18:18.680]   - So it was language models first,
[00:18:18.680 --> 00:18:20.800]   but then January of last year,
[00:18:20.800 --> 00:18:21.640]   when Cliff came out,
[00:18:21.640 --> 00:18:22.880]   I actually built a system,
[00:18:22.880 --> 00:18:24.680]   that's when I was actually getting over COVID,
[00:18:24.680 --> 00:18:27.360]   I built a system for my daughter to generate art,
[00:18:27.360 --> 00:18:29.520]   based on that, and it was amazing.
[00:18:29.520 --> 00:18:31.800]   So she created a vision board of what she wants to make,
[00:18:31.800 --> 00:18:32.760]   and then she made a description,
[00:18:32.760 --> 00:18:34.840]   and she created 16 images very slowly,
[00:18:34.840 --> 00:18:37.080]   which were like a bit smooshy and stylistic,
[00:18:37.080 --> 00:18:39.760]   and then she told how each one of them was different,
[00:18:39.760 --> 00:18:42.120]   and the system interpreted that to generate 16 more,
[00:18:42.120 --> 00:18:43.560]   16 more, 16 more.
[00:18:43.560 --> 00:18:45.880]   Eight hours later, she generated an image,
[00:18:45.880 --> 00:18:48.760]   that she then sold as an NFT for India code relief.
[00:18:48.760 --> 00:18:50.400]   She raised $3,500.
[00:18:50.400 --> 00:18:52.960]   This is amazing,
[00:18:52.960 --> 00:18:55.160]   especially because I have aphantasia,
[00:18:55.160 --> 00:18:57.880]   so I can't view anything in my head.
[00:18:57.880 --> 00:18:59.520]   It's a condition where you can't visualize anything,
[00:18:59.520 --> 00:19:01.520]   and I was like, something I can visualize stuff.
[00:19:01.520 --> 00:19:03.520]   Wouldn't it be great if anyone could visualize stuff,
[00:19:03.520 --> 00:19:06.240]   'cause the way that I've thought about things is that,
[00:19:06.240 --> 00:19:09.000]   you and I doing what we're doing right now, talking,
[00:19:09.000 --> 00:19:10.880]   is the easiest thing in the world for humans to do,
[00:19:10.880 --> 00:19:11.720]   relatively speaking.
[00:19:11.720 --> 00:19:12.960]   Sometimes you need a drink, right?
[00:19:12.960 --> 00:19:15.280]   But it's still relatively easy.
[00:19:15.280 --> 00:19:17.600]   Written is harder, that's why we pay people to be writers,
[00:19:17.600 --> 00:19:20.560]   and image is the hardest, creating art or PowerPoint,
[00:19:20.560 --> 00:19:22.280]   it's just really difficult and painful.
[00:19:22.280 --> 00:19:25.160]   But this technology can make it easy, so let's fund that.
[00:19:25.160 --> 00:19:27.000]   So last year I funded the whole space,
[00:19:27.000 --> 00:19:30.200]   and all the notebooks and models and developers,
[00:19:30.200 --> 00:19:33.480]   like I hired them, I funded them, gave them benefits,
[00:19:33.480 --> 00:19:36.920]   whatever they wanted, started building the compute resources,
[00:19:36.920 --> 00:19:38.960]   and there were a whole bunch of different models.
[00:19:38.960 --> 00:19:42.520]   The stable diffusion model came about from latent diffusion,
[00:19:42.520 --> 00:19:44.560]   which came out of CompViz.
[00:19:44.560 --> 00:19:47.880]   So that was a paper led written by Robin Rombak,
[00:19:47.880 --> 00:19:50.040]   who's our lead generator for AI researcher,
[00:19:50.040 --> 00:19:52.720]   and Andreas Blatman, who's joining us shortly.
[00:19:52.720 --> 00:19:56.160]   That was kind of a bit of a breakthrough in high speed,
[00:19:56.160 --> 00:19:58.400]   because they didn't have access to many GPUs,
[00:19:58.400 --> 00:20:01.120]   so they really optimized for high speed diffusion.
[00:20:01.120 --> 00:20:04.560]   Most of the advances in the sector, I think,
[00:20:04.560 --> 00:20:06.480]   can be credited probably to Catherine Krausen,
[00:20:06.480 --> 00:20:08.440]   Rivers Have Wings, is her Twitter handle,
[00:20:08.440 --> 00:20:10.960]   who's our other lead generator of AI researcher.
[00:20:10.960 --> 00:20:12.960]   And again, she was just in the community
[00:20:12.960 --> 00:20:14.840]   and just was delighted to support her
[00:20:14.840 --> 00:20:16.080]   in kind of building these models,
[00:20:16.080 --> 00:20:19.240]   as well as other teams like the R.U. Dali team and others.
[00:20:19.240 --> 00:20:22.360]   But then in about February of this year,
[00:20:22.360 --> 00:20:24.000]   kind of Robin messaged me, and he's like,
[00:20:24.000 --> 00:20:25.000]   "We need to scale this up.
[00:20:25.000 --> 00:20:26.720]   I think it could be a breakthrough."
[00:20:26.720 --> 00:20:27.560]   I agreed to it.
[00:20:27.560 --> 00:20:29.640]   And then the original stable diffusion released in August
[00:20:29.640 --> 00:20:32.760]   was under LMU CompViz.
[00:20:32.760 --> 00:20:34.800]   So CompViz is the lab led by Bjorn Oman,
[00:20:34.800 --> 00:20:37.440]   and Robin, and then Patrick,
[00:20:37.440 --> 00:20:38.720]   who was at RunwayML,
[00:20:38.720 --> 00:20:40.880]   who's at RunwayML as their lead generator of AI researcher,
[00:20:40.880 --> 00:20:42.200]   with the two leads on that.
[00:20:42.200 --> 00:20:45.720]   So Robin has his stability and then Patrick there,
[00:20:45.720 --> 00:20:47.680]   creating it because the approach that I've always taken
[00:20:47.680 --> 00:20:49.320]   at stability, because we support communities
[00:20:49.320 --> 00:20:51.600]   doing all the models, is a collaborative one,
[00:20:51.600 --> 00:20:55.680]   whereby R core team, infra team,
[00:20:55.680 --> 00:20:58.200]   academic, independence, and others all coming together
[00:20:58.200 --> 00:20:59.840]   can build much better technology.
[00:20:59.840 --> 00:21:03.280]   And that's why I'm a stable diffusion.
[00:21:03.280 --> 00:21:04.360]   A whole bunch of people got together,
[00:21:04.360 --> 00:21:06.680]   but it was really Robin and Patrick leading it.
[00:21:07.520 --> 00:21:10.440]   And they pushed the boundaries and achieved amazing things.
[00:21:10.440 --> 00:21:12.240]   They took 100,000 gigabytes of images
[00:21:12.240 --> 00:21:14.800]   and compressed it down to a 1.6 gigabyte file
[00:21:14.800 --> 00:21:16.320]   that could create just about anything.
[00:21:16.320 --> 00:21:18.760]   And that was insane.
[00:21:18.760 --> 00:21:21.000]   And that was released August 23rd.
[00:21:21.000 --> 00:21:23.040]   And yeah, since then.
[00:21:23.040 --> 00:21:24.560]   So what's been the growth?
[00:21:24.560 --> 00:21:26.120]   So the company was established,
[00:21:26.120 --> 00:21:28.440]   so just so folks are following,
[00:21:28.440 --> 00:21:30.640]   stable diffusion is the open source project
[00:21:30.640 --> 00:21:32.600]   of which you help fund,
[00:21:32.600 --> 00:21:36.520]   and part of your team also started.
[00:21:36.520 --> 00:21:40.160]   And then the company around it is Stability AI.
[00:21:40.160 --> 00:21:44.000]   So when was Stability actually incorporated?
[00:21:44.000 --> 00:21:45.280]   It was probably about two years ago
[00:21:45.280 --> 00:21:48.040]   when we were leading one of the UN AI initiatives.
[00:21:48.040 --> 00:21:51.080]   So we designed and architected that.
[00:21:51.080 --> 00:21:53.360]   And then it kicked off probably about 14 months ago,
[00:21:53.360 --> 00:21:55.400]   saying let's do all the types of AI.
[00:21:55.400 --> 00:21:57.600]   So right now we do all the types of AI
[00:21:57.600 --> 00:22:00.240]   from language models to protein to image and others.
[00:22:00.240 --> 00:22:02.480]   But stable diffusion is the most popular open source software
[00:22:02.480 --> 00:22:03.760]   in the world ever.
[00:22:03.760 --> 00:22:06.280]   So since launching August 23rd,
[00:22:06.280 --> 00:22:09.920]   it's received 46,000 GitHub stars
[00:22:09.920 --> 00:22:12.840]   between version one, which was this collaborative thing,
[00:22:12.840 --> 00:22:15.240]   and version two, which was our highly optimized version
[00:22:15.240 --> 00:22:17.240]   that we ourselves released.
[00:22:17.240 --> 00:22:19.360]   Plus a bunch of tools around that.
[00:22:19.360 --> 00:22:20.240]   To give you an example,
[00:22:20.240 --> 00:22:22.520]   it's overtaken Bitcoin and Ethereum,
[00:22:22.520 --> 00:22:24.880]   which took about 10 years to get to that level
[00:22:24.880 --> 00:22:26.400]   of developer interest.
[00:22:26.400 --> 00:22:28.880]   And when you add up all the stars of the ecosystem,
[00:22:28.880 --> 00:22:30.880]   it's now the most popular open source software
[00:22:30.880 --> 00:22:33.040]   in the world ever, just in three months.
[00:22:34.040 --> 00:22:36.320]   So the other models are amazing,
[00:22:36.320 --> 00:22:38.040]   like the language models from Alutha,
[00:22:38.040 --> 00:22:39.640]   which is one of the communities that we support,
[00:22:39.640 --> 00:22:41.680]   and we hope to spin off into foundation soon.
[00:22:41.680 --> 00:22:43.600]   They've been downloaded 25 million times
[00:22:43.600 --> 00:22:46.920]   in the most popular language models in the world.
[00:22:46.920 --> 00:22:49.480]   But this thing is just the most disruptive thing ever,
[00:22:49.480 --> 00:22:51.880]   and next year's it's gonna get even more disruptive.
[00:22:51.880 --> 00:22:53.600]   It's what powers things like Lenzer,
[00:22:53.600 --> 00:22:54.960]   which is the number one app on the app store.
[00:22:54.960 --> 00:22:57.720]   I think they're making $5 million a day.
[00:22:57.720 --> 00:22:58.560]   It's quite nice.
[00:22:58.560 --> 00:23:01.160]   And a whole bunch of other things.
[00:23:01.160 --> 00:23:02.960]   - Maybe, tell people what Lenzer is.
[00:23:02.960 --> 00:23:04.960]   I played around with it, but...
[00:23:04.960 --> 00:23:06.760]   - Yeah, so Lenzer or Dawn AI,
[00:23:06.760 --> 00:23:08.440]   you upload 10 pictures of your face,
[00:23:08.440 --> 00:23:10.920]   and then it puts you in all sorts of different,
[00:23:10.920 --> 00:23:14.160]   it's like artistic variants and things like that.
[00:23:14.160 --> 00:23:18.000]   - We'll upload here my version of it for people to see,
[00:23:18.000 --> 00:23:21.920]   but it's super cool to see the power of these things.
[00:23:21.920 --> 00:23:23.480]   - But these things are getting, again,
[00:23:23.480 --> 00:23:25.200]   exponentially more powerful.
[00:23:25.200 --> 00:23:28.640]   So when we released stable diffusion August the 23rd,
[00:23:28.640 --> 00:23:31.120]   it was 5.6 seconds for an image
[00:23:31.120 --> 00:23:33.120]   on the highest end graphics card.
[00:23:33.120 --> 00:23:36.520]   Now it's 0.9 seconds for an image.
[00:23:36.520 --> 00:23:39.600]   In January, it'll be 30 images a second.
[00:23:39.600 --> 00:23:41.080]   There's a hundred times speed increase
[00:23:41.080 --> 00:23:42.720]   that we've managed to achieve
[00:23:42.720 --> 00:23:45.160]   working with various teams around the world,
[00:23:45.160 --> 00:23:47.320]   which is insane for this tiny one gigabyte file.
[00:23:47.320 --> 00:23:49.200]   So what you just saw with Lenzer,
[00:23:49.200 --> 00:23:52.320]   imagine if you could do that whole process 20 times faster.
[00:23:52.320 --> 00:23:53.320]   - I mean, it's super cool.
[00:23:53.320 --> 00:23:55.880]   Hopefully people admire the picture of me
[00:23:55.880 --> 00:23:58.640]   that we showed on screen for YouTube people.
[00:23:58.640 --> 00:24:02.880]   But what are the use cases today
[00:24:02.880 --> 00:24:07.480]   that have people so excited in a practical sense?
[00:24:07.480 --> 00:24:09.600]   Like obviously it's cool to be able to do this
[00:24:09.600 --> 00:24:11.040]   in real time of myself, but--
[00:24:11.040 --> 00:24:13.240]   - Look, it disrupts the entire creative industry
[00:24:13.240 --> 00:24:16.480]   and a year or two will be generating whole movies real time.
[00:24:16.480 --> 00:24:17.760]   - And what does that actually mean?
[00:24:17.760 --> 00:24:19.960]   - It means you describe that I want to generate a movie
[00:24:19.960 --> 00:24:21.240]   about, I don't know,
[00:24:21.240 --> 00:24:25.560]   Emma and Logan having a coffee at Starbucks or whatever.
[00:24:25.560 --> 00:24:27.520]   You input a few assets of our faces
[00:24:27.520 --> 00:24:29.680]   and then a short while later you have a movie
[00:24:29.680 --> 00:24:31.160]   about them having a chat
[00:24:31.160 --> 00:24:33.000]   and the chat is instantly generated as well
[00:24:33.000 --> 00:24:34.600]   about any topic that you want.
[00:24:34.600 --> 00:24:36.720]   If you get a practical example right now,
[00:24:36.720 --> 00:24:38.680]   there's a film that is shooting,
[00:24:38.680 --> 00:24:41.480]   it can't reveal details with some very famous people.
[00:24:41.480 --> 00:24:44.480]   They had to do a photo binder
[00:24:44.480 --> 00:24:47.920]   with like 30 different actresses inside it
[00:24:47.920 --> 00:24:50.800]   and those actresses were victims of a serial killer.
[00:24:50.800 --> 00:24:53.160]   It would have cost half a million dollars
[00:24:53.160 --> 00:24:54.680]   when you look at SAG daily rates,
[00:24:54.680 --> 00:24:56.920]   makeup, shooting, everything like that.
[00:24:56.920 --> 00:24:58.880]   Did it in two hours using this technology,
[00:24:58.880 --> 00:25:01.760]   save the production half a million dollars.
[00:25:01.760 --> 00:25:06.120]   We're seeing companies basically bring videos to market,
[00:25:06.120 --> 00:25:10.480]   75% quicker, so 20, well, three, four times quicker.
[00:25:10.480 --> 00:25:12.440]   Video games will be generated,
[00:25:12.440 --> 00:25:14.240]   the assets for that even quicker as well,
[00:25:14.240 --> 00:25:16.360]   it's about 25% of video game budgets.
[00:25:16.360 --> 00:25:17.920]   So the people that are using this technology
[00:25:17.920 --> 00:25:20.800]   are just massively slashing creation costs.
[00:25:20.800 --> 00:25:23.600]   So there's a real enterprise solution version of that.
[00:25:23.600 --> 00:25:25.640]   For the average individual listening to this,
[00:25:25.640 --> 00:25:27.120]   this is the technology that will mean
[00:25:27.120 --> 00:25:29.200]   that you'll never have to build a PowerPoint slide again
[00:25:29.200 --> 00:25:31.880]   in a couple of years because you just describe it
[00:25:31.880 --> 00:25:34.440]   and then you'll say make it happier or sad or whatever
[00:25:34.440 --> 00:25:36.800]   when you combine this with a language model and code model.
[00:25:36.800 --> 00:25:39.480]   And you'll never have to see any of that abstraction.
[00:25:39.480 --> 00:25:43.760]   You know, so this is crazy impactful technology.
[00:25:43.760 --> 00:25:45.520]   The fact that it goes real time,
[00:25:45.520 --> 00:25:48.160]   the, it's not just the image creation, right?
[00:25:48.160 --> 00:25:49.000]   It's the image editing.
[00:25:49.000 --> 00:25:50.760]   So we released an in painting model,
[00:25:50.760 --> 00:25:52.520]   a depth to image that takes your face
[00:25:52.520 --> 00:25:55.160]   and puts it into 3D and then understands all the lighting
[00:25:55.160 --> 00:25:57.200]   so you can adjust your lighting dynamically.
[00:25:57.200 --> 00:26:02.600]   Upscaler, so you can go to 4K to 1K, just almost real time.
[00:26:02.600 --> 00:26:04.440]   Making that real time means that, you know,
[00:26:04.440 --> 00:26:06.880]   you could say I want Emma to have a hat
[00:26:06.880 --> 00:26:10.480]   and then I want them to have a less bushy mustache, you know,
[00:26:10.480 --> 00:26:12.240]   what happens if his eyes are green
[00:26:12.240 --> 00:26:13.720]   and it will do that instantly.
[00:26:13.720 --> 00:26:16.800]   It removes all the barriers to creation.
[00:26:16.800 --> 00:26:18.200]   I think people aren't ready for that.
[00:26:18.200 --> 00:26:19.680]   I'm not ready for that.
[00:26:19.680 --> 00:26:21.660]   And it's there right now.
[00:26:21.660 --> 00:26:24.440]   This is also the case when you see it go consumer.
[00:26:24.440 --> 00:26:26.080]   Like I said, Lenza has gone,
[00:26:26.080 --> 00:26:28.200]   our kids tell us about it and stuff like that.
[00:26:28.200 --> 00:26:30.160]   And there's other technology that's happening exactly
[00:26:30.160 --> 00:26:32.320]   at the same time that will be just as disruptive
[00:26:32.320 --> 00:26:33.920]   like chat, GPT, et cetera.
[00:26:33.920 --> 00:26:35.640]   - Yeah, I want to get into the chat side of it.
[00:26:35.640 --> 00:26:39.240]   But while we're talking about image and video and all that,
[00:26:39.240 --> 00:26:42.560]   I guess I'll give you an opportunity to wax poetic
[00:26:42.560 --> 00:26:46.560]   and maybe a little philosophically about the implications
[00:26:46.560 --> 00:26:50.560]   of what this actually means and what creativity,
[00:26:50.560 --> 00:26:52.920]   'cause I assume anyone listening to this,
[00:26:52.920 --> 00:26:56.160]   like some of the things of the ability to make a movie
[00:26:56.160 --> 00:26:59.360]   in real time about the two of us having coffee
[00:26:59.360 --> 00:27:01.800]   and change your bus dash and all that,
[00:27:01.800 --> 00:27:03.800]   it's a great tangible example,
[00:27:03.800 --> 00:27:07.640]   but maybe hard to understand why that matters.
[00:27:07.640 --> 00:27:09.760]   And obviously there's more tangible cases
[00:27:09.760 --> 00:27:12.800]   of the ability to green screen out backgrounds
[00:27:12.800 --> 00:27:16.760]   with a serial killer example in the movie example you gave.
[00:27:16.760 --> 00:27:19.600]   But can you talk a little bit about like the power
[00:27:19.600 --> 00:27:23.760]   of creativity and imagery and what you think
[00:27:23.760 --> 00:27:26.480]   this unlocks for people?
[00:27:26.480 --> 00:27:28.600]   - Yeah, look, my mom sends me memes every day now
[00:27:28.600 --> 00:27:30.680]   using this technology about why I don't call her
[00:27:30.680 --> 00:27:33.320]   and my guilt levels have gone up massively.
[00:27:33.320 --> 00:27:35.520]   You know, it allows people just to create anything
[00:27:35.520 --> 00:27:38.040]   and the value of creativity versus consumption
[00:27:38.040 --> 00:27:39.120]   can't be underestimated.
[00:27:39.120 --> 00:27:40.720]   Like one of the most effective therapies
[00:27:40.720 --> 00:27:43.080]   for mental health is art therapy for a reason
[00:27:43.080 --> 00:27:45.400]   because this is communication.
[00:27:45.400 --> 00:27:48.320]   And how many people listening to this right now
[00:27:48.320 --> 00:27:49.960]   believe they can create?
[00:27:49.960 --> 00:27:52.600]   Probably very few, but the reality is that everyone can,
[00:27:52.600 --> 00:27:54.200]   but they don't have the tools to
[00:27:54.200 --> 00:27:55.120]   and they have barriers to it.
[00:27:55.120 --> 00:27:57.400]   Those barriers will be removed as of next year.
[00:27:57.400 --> 00:27:59.120]   You'll be able to create anything you can imagine,
[00:27:59.120 --> 00:28:02.640]   first in 2D, then in audio, then in 3D, then in video.
[00:28:02.640 --> 00:28:04.520]   And then you'll be wanting to share stories.
[00:28:04.520 --> 00:28:06.160]   I don't think it'll be like you remember that
[00:28:06.160 --> 00:28:08.560]   she's seen in Wall-E where he's got like that VR headset
[00:28:08.560 --> 00:28:11.400]   and he's all fat and stuff and everyone's in their own world.
[00:28:11.400 --> 00:28:12.240]   I don't think that's the case.
[00:28:12.240 --> 00:28:14.680]   People like sharing with our stories driven narrative creatures
[00:28:14.680 --> 00:28:16.560]   and this allows us to tell more stories.
[00:28:16.560 --> 00:28:18.840]   And I think it gives people agency
[00:28:18.840 --> 00:28:21.800]   because, you know, again, like I've done lots of art therapy
[00:28:21.800 --> 00:28:24.640]   with people, like it really improves their lives.
[00:28:24.640 --> 00:28:26.560]   It improves your life when you are creating
[00:28:26.560 --> 00:28:28.080]   no matter what it is.
[00:28:28.080 --> 00:28:30.320]   And then again, too few of us believe that we can.
[00:28:30.320 --> 00:28:31.640]   We lose that childhood joy, right?
[00:28:31.640 --> 00:28:34.160]   Like when you're a kid, of course you can create.
[00:28:34.160 --> 00:28:35.960]   Then you get to your teenage years and you're like,
[00:28:35.960 --> 00:28:37.120]   "Ah, those people are better than me.
[00:28:37.120 --> 00:28:38.520]   I don't have time to do that."
[00:28:38.520 --> 00:28:40.160]   And then it moves away from that.
[00:28:40.160 --> 00:28:42.440]   And when you get to sad old folk like us,
[00:28:42.440 --> 00:28:45.560]   it's like, I have to learn to draw or paint
[00:28:45.560 --> 00:28:46.400]   or something like that.
[00:28:46.400 --> 00:28:49.040]   You do that on a holiday and then it's really rewarding.
[00:28:49.040 --> 00:28:51.280]   But too many people don't have access to that.
[00:28:51.280 --> 00:28:53.160]   I think, again, we've made that happen.
[00:28:53.160 --> 00:28:56.240]   So I think creation beats consumption
[00:28:56.240 --> 00:28:58.280]   and now everyone can create.
[00:28:58.280 --> 00:29:00.400]   And so I think the world will be happier.
[00:29:00.400 --> 00:29:02.440]   Some people are gonna use it in a douchebag way
[00:29:02.440 --> 00:29:04.040]   but that's why we live in a society
[00:29:04.040 --> 00:29:06.160]   that has retigants against this.
[00:29:06.160 --> 00:29:08.840]   And this technology is pretty inevitable.
[00:29:08.840 --> 00:29:10.280]   It's going now.
[00:29:10.280 --> 00:29:13.080]   And again, exponentials are a hell of a thing.
[00:29:13.080 --> 00:29:14.560]   So we've got to get used to it
[00:29:14.560 --> 00:29:15.720]   where it impacts our industries
[00:29:15.720 --> 00:29:17.400]   and we're gonna take advantage of it
[00:29:17.400 --> 00:29:19.200]   where it can make our lives better.
[00:29:19.200 --> 00:29:21.680]   - The inevitability of a lot of things you're talking about
[00:29:21.680 --> 00:29:25.000]   is increased productivity, increased efficiency,
[00:29:25.000 --> 00:29:27.640]   increased creativity, right?
[00:29:27.640 --> 00:29:30.920]   The flip side of any big gain in productivity
[00:29:30.920 --> 00:29:35.600]   is a loss of potential jobs
[00:29:35.600 --> 00:29:38.240]   or existing skill sets or responsibilities
[00:29:38.240 --> 00:29:39.080]   that people have.
[00:29:39.080 --> 00:29:42.000]   In your case, if you speed something up 75%,
[00:29:42.000 --> 00:29:43.680]   there's some vector of time.
[00:29:43.680 --> 00:29:45.600]   And so we're giving back time to people.
[00:29:45.600 --> 00:29:48.560]   But there's gonna be inevitability of
[00:29:48.560 --> 00:29:51.200]   some people are gonna probably lose their jobs
[00:29:51.200 --> 00:29:52.920]   over these types of things.
[00:29:52.920 --> 00:29:56.400]   How do you think about that in productivity gains
[00:29:56.400 --> 00:30:00.760]   versus some of the things that could create job loss?
[00:30:00.760 --> 00:30:03.240]   - Yeah, so this has been the nature of technology, right?
[00:30:03.240 --> 00:30:05.240]   Technology is always a productivity gains.
[00:30:05.240 --> 00:30:08.040]   And yet here we are at full employment today.
[00:30:08.040 --> 00:30:09.640]   Like I believe the number of photographers
[00:30:09.640 --> 00:30:12.680]   has increased by 25% over the last three years.
[00:30:12.680 --> 00:30:15.040]   iPhones are really good at photographs now.
[00:30:15.040 --> 00:30:16.360]   But yeah, there's still a job there.
[00:30:16.360 --> 00:30:18.880]   This is the thing I said about AlphaGo and LisaDoll.
[00:30:18.880 --> 00:30:20.120]   The average level of Go players
[00:30:20.120 --> 00:30:22.560]   has gone up exponentially over the last few years.
[00:30:22.560 --> 00:30:24.000]   So I think this is augmenting technology
[00:30:24.000 --> 00:30:26.080]   as opposed to replacing technology.
[00:30:26.080 --> 00:30:28.000]   But there are certain areas where you have to consider
[00:30:28.000 --> 00:30:29.560]   what is the future of the rendering industry
[00:30:29.560 --> 00:30:30.440]   and other things like that
[00:30:30.440 --> 00:30:32.560]   when you can automatically generate any type of asset
[00:30:32.560 --> 00:30:33.960]   or much real time.
[00:30:33.960 --> 00:30:35.160]   You know?
[00:30:35.160 --> 00:30:37.160]   So I think it will create and it will destroy.
[00:30:37.160 --> 00:30:38.440]   And this is the nature of technology.
[00:30:38.440 --> 00:30:40.440]   Again, the technology is not neutral.
[00:30:40.440 --> 00:30:43.480]   Technology is kind of this on-watching thing.
[00:30:43.480 --> 00:30:45.680]   And it will never be completely positive.
[00:30:45.680 --> 00:30:50.320]   So yeah, this is why I think the important thing
[00:30:50.320 --> 00:30:52.840]   is given the pace of this adoption of this technology
[00:30:52.840 --> 00:30:55.000]   versus any technology that I saw
[00:30:55.000 --> 00:30:58.600]   when I was a hedge fund manager or I was a VC for a bit,
[00:30:58.600 --> 00:30:59.760]   people just got to get used to it
[00:30:59.760 --> 00:31:00.920]   and they got to really understand it
[00:31:00.920 --> 00:31:03.040]   because it's something quite alien
[00:31:03.040 --> 00:31:04.320]   and massively impactful.
[00:31:04.320 --> 00:31:05.160]   - I could make the,
[00:31:05.160 --> 00:31:07.400]   or I think people have made the case
[00:31:07.400 --> 00:31:09.800]   that this is so powerful,
[00:31:09.800 --> 00:31:13.640]   that there's negative ramifications,
[00:31:13.640 --> 00:31:16.320]   the likes of which we don't even know
[00:31:16.320 --> 00:31:21.040]   what the implications of this are from a societal standpoint.
[00:31:21.040 --> 00:31:22.960]   And I think we've seen some people call
[00:31:22.960 --> 00:31:25.960]   for all these things to be pulled back
[00:31:25.960 --> 00:31:27.800]   until we have a better understanding
[00:31:27.800 --> 00:31:30.560]   and can put in the right constraints and controls.
[00:31:30.560 --> 00:31:32.920]   I get the feeling you have a much more optimistic
[00:31:32.920 --> 00:31:34.720]   take about human nature
[00:31:34.720 --> 00:31:39.400]   and also how these things sort out over time.
[00:31:39.400 --> 00:31:41.840]   Can you just talk a little bit about like your perspective
[00:31:41.840 --> 00:31:46.840]   on the trade off between openness and accessibility
[00:31:46.840 --> 00:31:52.040]   with negative unintended consequences in general?
[00:31:52.040 --> 00:31:53.920]   - Yeah, well, so powerful technologies can do anything
[00:31:53.920 --> 00:31:56.160]   'cause these models are general few shot learners
[00:31:56.160 --> 00:31:58.080]   as they can learn a little and then they can just do
[00:31:58.080 --> 00:31:59.240]   just about anything.
[00:31:59.240 --> 00:32:00.920]   It's a very powerful one, right?
[00:32:00.920 --> 00:32:03.280]   But like, I was thinking about people saying that,
[00:32:03.280 --> 00:32:05.680]   I was like, so why do they want Indians or Africans
[00:32:05.680 --> 00:32:07.480]   to have this technology?
[00:32:07.480 --> 00:32:10.480]   Because actually it's an inherently colonialist
[00:32:10.480 --> 00:32:12.600]   kind of racist way to look at the world
[00:32:12.600 --> 00:32:14.760]   because when you ask them, it's when they educate them enough
[00:32:14.760 --> 00:32:16.160]   or they don't know better.
[00:32:16.160 --> 00:32:19.320]   Because there is this thing where there's this like guys
[00:32:19.320 --> 00:32:20.560]   that tech people know better,
[00:32:20.560 --> 00:32:23.160]   but then it's like nobody elected them.
[00:32:23.160 --> 00:32:24.160]   So they're self-appointed.
[00:32:24.160 --> 00:32:25.160]   So what's the answer to this?
[00:32:25.160 --> 00:32:28.720]   The answer for me was to move towards open,
[00:32:28.720 --> 00:32:30.560]   but that widens the discussion.
[00:32:30.560 --> 00:32:32.200]   So how many developers now are developing
[00:32:32.200 --> 00:32:33.440]   on stable diffusion?
[00:32:33.440 --> 00:32:34.600]   Millions.
[00:32:34.600 --> 00:32:35.760]   They're all voices now
[00:32:35.760 --> 00:32:38.000]   and they're people who weren't developing before.
[00:32:38.000 --> 00:32:39.640]   How many governments are talking about it,
[00:32:39.640 --> 00:32:41.080]   just about all of them?
[00:32:41.080 --> 00:32:43.640]   All of the media studios are talking about it now
[00:32:43.640 --> 00:32:45.240]   and it's in the public sphere.
[00:32:45.240 --> 00:32:47.560]   And there will be policy debates on this.
[00:32:47.560 --> 00:32:49.080]   And that's a good thing.
[00:32:49.080 --> 00:32:51.080]   You know, again, we have mechanisms in society
[00:32:51.080 --> 00:32:53.440]   to decide about these technologies and other things.
[00:32:53.440 --> 00:32:54.920]   And I think the overwhelming output
[00:32:54.920 --> 00:32:56.680]   of stable diffusion has been good.
[00:32:56.680 --> 00:32:58.800]   Like four channel places like that have had this
[00:32:58.800 --> 00:33:01.680]   for months now, nothing bad has really come of it.
[00:33:01.680 --> 00:33:03.320]   You know, we've had technologies to deep fix
[00:33:03.320 --> 00:33:04.760]   and other things, but people haven't realized
[00:33:04.760 --> 00:33:06.480]   that it's quite so easy to do.
[00:33:06.480 --> 00:33:07.920]   Now they do realize.
[00:33:07.920 --> 00:33:09.960]   So my thing has been about,
[00:33:09.960 --> 00:33:12.160]   I think it's unethical to control access
[00:33:12.160 --> 00:33:13.600]   to powerful technology.
[00:33:13.600 --> 00:33:15.840]   This kind of echoes as well the cryptography debates
[00:33:15.840 --> 00:33:17.520]   that we had decades ago,
[00:33:17.520 --> 00:33:20.200]   whereas like bad guys can use this to do bad things.
[00:33:20.200 --> 00:33:22.520]   What would happen if you'd have mathematics outlawed
[00:33:22.520 --> 00:33:24.000]   like back then?
[00:33:24.000 --> 00:33:25.600]   We wouldn't have cryptography saving us
[00:33:25.600 --> 00:33:27.840]   from the bad guys, you know?
[00:33:27.840 --> 00:33:30.120]   So I think there's a lot of red teaming
[00:33:30.120 --> 00:33:31.160]   because it is dangerous
[00:33:31.160 --> 00:33:33.280]   and because only big corporations could do this
[00:33:33.280 --> 00:33:35.480]   and they were too afraid to release it.
[00:33:35.480 --> 00:33:37.240]   I think there was enough green teaming
[00:33:37.240 --> 00:33:38.880]   of what could be the positives from this.
[00:33:38.880 --> 00:33:39.880]   'Cause like I said,
[00:33:39.880 --> 00:33:41.800]   how many people in the world can create next year
[00:33:41.800 --> 00:33:43.520]   who couldn't create because of us?
[00:33:43.520 --> 00:33:46.320]   Hundreds of millions.
[00:33:46.320 --> 00:33:49.240]   And that is a net benefit and happiness to society.
[00:33:49.240 --> 00:33:51.760]   But that is a reflection of anyone's bottom lines, right?
[00:33:51.760 --> 00:33:53.600]   Well, maybe ours, but you know what I mean.
[00:33:53.600 --> 00:33:55.560]   So I think this is a very complicated thing
[00:33:55.560 --> 00:33:57.760]   and people talk about it in three terms.
[00:33:57.760 --> 00:33:58.720]   There's ethics.
[00:33:58.720 --> 00:34:01.560]   And ethics is my personal and your personal.
[00:34:01.560 --> 00:34:03.600]   I don't think anyone has a right
[00:34:03.600 --> 00:34:05.360]   to call anyone else unethical
[00:34:05.360 --> 00:34:07.280]   unless they really understand that person.
[00:34:07.280 --> 00:34:09.560]   Morals are what we decide as society.
[00:34:09.560 --> 00:34:12.040]   But my morals in the UK are different to your morals
[00:34:12.040 --> 00:34:15.160]   and the US are different to Indian and Chinese morals.
[00:34:15.160 --> 00:34:16.680]   And then finally, there's legal.
[00:34:16.680 --> 00:34:18.360]   And legal are these codifications
[00:34:18.360 --> 00:34:21.040]   of moral boundaries that we kind of put in.
[00:34:21.040 --> 00:34:23.040]   And we need to catch up on all of these.
[00:34:23.040 --> 00:34:25.120]   There's one final point I'd like to make is the alternative
[00:34:25.120 --> 00:34:26.800]   is that this technology is the preserve
[00:34:26.800 --> 00:34:30.080]   of large companies who mostly focus on ads.
[00:34:30.080 --> 00:34:33.000]   And it's really persuasive this technology.
[00:34:33.000 --> 00:34:36.360]   So like we've got human realistic emotional voices
[00:34:36.360 --> 00:34:38.080]   and faces and other things.
[00:34:38.080 --> 00:34:39.480]   What would happen if your chat assistant
[00:34:39.480 --> 00:34:42.320]   started whispering at you to buy something?
[00:34:42.320 --> 00:34:44.400]   What's the regulation legislation around that?
[00:34:44.400 --> 00:34:46.640]   What's the legislation around really large language models
[00:34:46.640 --> 00:34:48.400]   as opposed to our tiny models
[00:34:48.400 --> 00:34:50.320]   that can get to human level
[00:34:50.320 --> 00:34:52.840]   and should only big tech be allowed to use that?
[00:34:52.840 --> 00:34:54.680]   I think it should all be regulated.
[00:34:54.680 --> 00:34:56.720]   I don't know what that regulation should be.
[00:34:56.720 --> 00:34:58.040]   And I will give my voice to it.
[00:34:58.040 --> 00:35:00.720]   And I hope more people give their voices to it
[00:35:00.720 --> 00:35:03.560]   instead of it being just decided behind closed doors.
[00:35:03.560 --> 00:35:05.560]   So it's quite a complex thing.
[00:35:05.560 --> 00:35:07.840]   I don't think there's any governance structure currently
[00:35:07.840 --> 00:35:10.520]   that I've seen that can handle it.
[00:35:10.520 --> 00:35:11.560]   I think that we should work together
[00:35:11.560 --> 00:35:13.400]   to put these things in place.
[00:35:13.400 --> 00:35:15.840]   Now, the state of the industry today,
[00:35:15.840 --> 00:35:18.560]   there's obviously the large companies
[00:35:18.560 --> 00:35:22.600]   that have some advertising related business, Google,
[00:35:22.600 --> 00:35:25.000]   Facebook, et cetera, Microsoft.
[00:35:25.000 --> 00:35:28.560]   And then there's the private companies
[00:35:28.560 --> 00:35:30.800]   kind of going after this opportunity.
[00:35:30.800 --> 00:35:33.880]   In my mind, there's businesses like,
[00:35:33.880 --> 00:35:38.160]   we talked about runway or mid-journey or some of those,
[00:35:38.160 --> 00:35:40.600]   but that the two fundamental companies
[00:35:40.600 --> 00:35:41.840]   that seems going after this,
[00:35:41.840 --> 00:35:43.840]   or at least when I talk to people I hear about,
[00:35:43.840 --> 00:35:46.320]   are you all and then OpenAI,
[00:35:46.320 --> 00:35:48.160]   who we've referenced before.
[00:35:48.160 --> 00:35:51.800]   How do you contrast your style
[00:35:51.800 --> 00:35:54.760]   and approach to what they're doing?
[00:35:54.760 --> 00:35:57.040]   So, yeah, I think OpenAI and us
[00:35:57.040 --> 00:35:58.920]   are the only independent multi-modal companies.
[00:35:58.920 --> 00:36:02.480]   Multi-modal mean that we do all of the types of models, right?
[00:36:02.480 --> 00:36:05.280]   Again, runway, do a fantastic work around video, mid-journey.
[00:36:05.280 --> 00:36:08.640]   David is super focused on kind of images
[00:36:08.640 --> 00:36:11.040]   and video games in the future that are streamed.
[00:36:11.040 --> 00:36:12.680]   That'll be super cool.
[00:36:12.680 --> 00:36:15.120]   We're kind of foundation layer companies as it were.
[00:36:15.120 --> 00:36:16.320]   So we're building the building blocks
[00:36:16.320 --> 00:36:17.640]   that make this accessible.
[00:36:17.640 --> 00:36:20.640]   OpenAI kind of emerged Elon Musk
[00:36:20.640 --> 00:36:22.560]   and a whole bunch of others wanting to build
[00:36:22.560 --> 00:36:26.040]   an open nonprofit for getting to AGI,
[00:36:26.040 --> 00:36:28.840]   this AI that can do anything to augment human potential.
[00:36:28.840 --> 00:36:31.920]   Had to do ups and downs and change over the years,
[00:36:31.920 --> 00:36:33.280]   but it's doing amazing work right now,
[00:36:33.280 --> 00:36:36.040]   but their objective is that generalized intelligence
[00:36:36.040 --> 00:36:38.080]   and their model has moved more close now
[00:36:38.080 --> 00:36:39.280]   because they think it's dangerous.
[00:36:39.280 --> 00:36:40.880]   Although they've released amazing open-source stuff,
[00:36:40.880 --> 00:36:42.680]   so they just released a good tokenizer,
[00:36:42.680 --> 00:36:44.360]   there is Whisper, which is one of the best things
[00:36:44.360 --> 00:36:47.000]   to turn this podcast into text, et cetera.
[00:36:47.000 --> 00:36:48.600]   So, yeah, they're selective about that
[00:36:48.600 --> 00:36:50.280]   and that's a prerogative.
[00:36:50.280 --> 00:36:52.640]   Their model is data to the models.
[00:36:52.640 --> 00:36:54.640]   So you can fine-tune their GPT,
[00:36:54.640 --> 00:36:55.560]   which is a language model,
[00:36:55.560 --> 00:36:57.240]   and so I'm sure their image model,
[00:36:57.240 --> 00:36:59.320]   hopefully they'll open-source GPT-3 as well,
[00:36:59.320 --> 00:37:01.640]   so, value three, which is their new image model,
[00:37:01.640 --> 00:37:03.200]   now that we've shown that it's relatively safe,
[00:37:03.200 --> 00:37:04.760]   'cause they wanna make their models better.
[00:37:04.760 --> 00:37:06.120]   And then they have a deal with Microsoft
[00:37:06.120 --> 00:37:08.560]   where Microsoft commercializes their models
[00:37:08.560 --> 00:37:12.640]   and then funds them, which is a great kind of partnership.
[00:37:12.640 --> 00:37:13.640]   Our model's a bit different
[00:37:13.640 --> 00:37:16.480]   and the our model is models to the data,
[00:37:16.480 --> 00:37:18.280]   whereby we're creating open-source models
[00:37:18.280 --> 00:37:20.000]   that you can take onto your own code base
[00:37:20.000 --> 00:37:21.840]   or your own asset images,
[00:37:21.840 --> 00:37:24.720]   and we've teamed up with AWS SageMaker for this,
[00:37:24.720 --> 00:37:27.560]   and then you can customize it to your own experiences.
[00:37:27.560 --> 00:37:29.480]   We don't really care about taking customer data
[00:37:29.480 --> 00:37:31.080]   and using it to improve our own models.
[00:37:31.080 --> 00:37:34.320]   Instead, we're like scale and service is the way.
[00:37:34.320 --> 00:37:36.960]   You know, so if you wanna customize version of the model,
[00:37:36.960 --> 00:37:38.600]   the best people to come to is us
[00:37:38.600 --> 00:37:40.080]   and it'll be a million dollars a pop
[00:37:40.080 --> 00:37:42.080]   to train those things up, you know,
[00:37:42.080 --> 00:37:43.560]   actually more in some cases.
[00:37:43.560 --> 00:37:46.000]   If you wanna scale it, scaling these models is hard,
[00:37:46.000 --> 00:37:47.800]   but we can scale your customized models for you
[00:37:47.800 --> 00:37:50.360]   and again, we will have a fair deal on that one.
[00:37:50.360 --> 00:37:52.080]   So even though we overlap in certain areas,
[00:37:52.080 --> 00:37:53.600]   I think we have very different philosophies
[00:37:53.600 --> 00:37:55.840]   'cause also our philosophy is getting this AI out to everyone
[00:37:55.840 --> 00:37:58.000]   and having everyone have their own personalized models
[00:37:58.000 --> 00:38:00.240]   versus building an AI that can do anything.
[00:38:00.240 --> 00:38:02.200]   And I think it's quite complimentary as well
[00:38:02.200 --> 00:38:05.120]   'cause you'll always have a Windows, Linux, you know,
[00:38:05.120 --> 00:38:07.440]   kind of thing going on there,
[00:38:07.440 --> 00:38:09.320]   Oracle, MySQL, et cetera.
[00:38:09.320 --> 00:38:13.440]   So yeah, so I think that hopefully kind of finds it.
[00:38:13.440 --> 00:38:15.400]   I think the final thing as well is that their focus
[00:38:15.400 --> 00:38:16.480]   has been on language models,
[00:38:16.480 --> 00:38:19.600]   such as the amazing chat GPT they've just released,
[00:38:19.600 --> 00:38:23.400]   with some image, whereas our focus is on media models
[00:38:23.400 --> 00:38:27.000]   with some elements of language and others.
[00:38:27.000 --> 00:38:28.840]   I think in the respective spaces,
[00:38:28.840 --> 00:38:30.720]   there's a lot of people who are quite amazing
[00:38:30.720 --> 00:38:32.360]   doing language models.
[00:38:32.360 --> 00:38:35.800]   I think we're the only ones doing media models at scale.
[00:38:35.800 --> 00:38:37.480]   - And inherent in all of that,
[00:38:37.480 --> 00:38:41.880]   the approach and the philosophy is the open source nature
[00:38:41.880 --> 00:38:44.240]   of what you guys are doing, right?
[00:38:44.240 --> 00:38:47.040]   And as you think about the business model
[00:38:47.040 --> 00:38:48.080]   around the open source,
[00:38:48.080 --> 00:38:51.800]   you touched on the service and management
[00:38:51.800 --> 00:38:54.040]   that you can have with different customers.
[00:38:54.040 --> 00:38:57.600]   But how do you think about what the ultimate business model
[00:38:57.600 --> 00:39:00.520]   for stability AI is going to be over time?
[00:39:00.520 --> 00:39:01.960]   Does it stay around that?
[00:39:01.960 --> 00:39:03.400]   You sell different applications
[00:39:03.400 --> 00:39:05.320]   on top of the existing primitives.
[00:39:05.320 --> 00:39:08.400]   What do you think the end state of this is for you?
[00:39:08.400 --> 00:39:10.000]   - We're fully vertically integrated.
[00:39:10.000 --> 00:39:11.120]   So we have our own products,
[00:39:11.120 --> 00:39:14.560]   we in Dream Studio Pro that we're releasing in January,
[00:39:14.560 --> 00:39:16.560]   where you can generate in time movies
[00:39:16.560 --> 00:39:18.600]   and storyboarding and 3D cameras
[00:39:18.600 --> 00:39:20.400]   and audio integration with our audio models
[00:39:20.400 --> 00:39:21.320]   and things like that.
[00:39:21.320 --> 00:39:22.520]   'Cause the models need to be used.
[00:39:22.520 --> 00:39:25.200]   We've also got integrations into Photoshop
[00:39:25.200 --> 00:39:27.720]   and all the other kind of interfaces as well,
[00:39:27.720 --> 00:39:30.880]   where you can use our services and custom models soon.
[00:39:30.880 --> 00:39:32.520]   So I think that's a really nice place to be.
[00:39:32.520 --> 00:39:34.200]   It's the layer one for AI.
[00:39:34.200 --> 00:39:35.680]   And you know, we support the whole sector.
[00:39:35.680 --> 00:39:38.360]   So when certain API companies who aren't us
[00:39:38.360 --> 00:39:40.520]   got stuck on GPUs, we unstuck it.
[00:39:40.520 --> 00:39:41.640]   When mid-journey was going,
[00:39:41.640 --> 00:39:44.800]   we gave them a small grant to get going, to do the beta.
[00:39:44.800 --> 00:39:45.760]   'Cause we thought it was amazing
[00:39:45.760 --> 00:39:47.560]   to have this technology out there.
[00:39:47.560 --> 00:39:48.840]   So we really view ourselves
[00:39:48.840 --> 00:39:50.960]   with that infrastructure layer, picks and shovels,
[00:39:50.960 --> 00:39:53.680]   as it were, and then other people build on top of what we do.
[00:39:53.680 --> 00:39:55.960]   You come to us, if you wanna have
[00:39:55.960 --> 00:39:57.560]   the vertically integrated best people
[00:39:57.560 --> 00:39:59.120]   in the world working with you.
[00:39:59.120 --> 00:40:00.600]   And every media company in the world
[00:40:00.600 --> 00:40:02.400]   and video game company needs that.
[00:40:02.400 --> 00:40:04.560]   And there's no other alternative.
[00:40:04.560 --> 00:40:07.280]   Because we build the models, we put them out there,
[00:40:07.280 --> 00:40:10.560]   and we also make them usable through our software.
[00:40:10.560 --> 00:40:11.960]   But, you know, we're not gonna have
[00:40:11.960 --> 00:40:13.160]   a huge number of customers.
[00:40:13.160 --> 00:40:14.640]   It's gonna be very selective,
[00:40:14.640 --> 00:40:16.360]   similar to a Palantir type thing.
[00:40:16.360 --> 00:40:20.040]   And then the tail is where we collaborate with partners
[00:40:20.040 --> 00:40:23.080]   like AWS, who make our models available for everyone.
[00:40:23.080 --> 00:40:24.400]   And we'll have more and more services
[00:40:24.400 --> 00:40:26.200]   around those across modalities.
[00:40:26.200 --> 00:40:29.040]   - How do you think about the difficulty
[00:40:29.040 --> 00:40:33.840]   of model and technology between image and text?
[00:40:33.840 --> 00:40:37.600]   So obviously there's differences between Dolly 2
[00:40:37.600 --> 00:40:39.320]   and Stable Diffusion.
[00:40:39.320 --> 00:40:41.040]   But I know you guys are working
[00:40:41.040 --> 00:40:42.280]   on a bunch of different models.
[00:40:42.280 --> 00:40:46.360]   And is the difficulty of making language work harder
[00:40:46.360 --> 00:40:49.720]   than image as you think about the problem set?
[00:40:49.720 --> 00:40:51.720]   Are they roughly the same?
[00:40:51.720 --> 00:40:53.240]   - They're roughly the same.
[00:40:53.240 --> 00:40:55.160]   Language is a lot more semantically dense,
[00:40:55.160 --> 00:40:56.840]   which is why the language models are a lot bigger.
[00:40:56.840 --> 00:40:59.800]   So it's incredibly difficult to get them to work on the edge.
[00:40:59.800 --> 00:41:02.440]   With Stable Diffusion, we announced Distill Stable Diffusion,
[00:41:02.440 --> 00:41:04.120]   which is a 20 times speed up,
[00:41:04.120 --> 00:41:06.040]   which will mean that we've spent up a hundred times
[00:41:06.040 --> 00:41:07.960]   since launch in August.
[00:41:07.960 --> 00:41:09.320]   That means it'll work in one second
[00:41:09.320 --> 00:41:11.840]   on an iPhone without internet, maybe two seconds.
[00:41:11.840 --> 00:41:12.680]   That's insane.
[00:41:12.680 --> 00:41:14.720]   Language models cannot do that.
[00:41:14.720 --> 00:41:17.640]   But language models, you can make great.
[00:41:17.640 --> 00:41:19.600]   So like I said, Chatsh EPT is an example
[00:41:19.600 --> 00:41:22.320]   that's hitting all of the press around that right now.
[00:41:22.320 --> 00:41:25.920]   So I think this challenge of getting accessible
[00:41:25.920 --> 00:41:27.720]   is gonna be the big one for us.
[00:41:27.720 --> 00:41:29.240]   And again, this is kind of our focus
[00:41:29.240 --> 00:41:30.680]   versus everyone else.
[00:41:30.680 --> 00:41:32.520]   Nobody else in the industry is focused on
[00:41:32.520 --> 00:41:35.400]   how do you get these things working on mobile?
[00:41:35.400 --> 00:41:36.720]   Because that's not their prerogative.
[00:41:36.720 --> 00:41:39.640]   Or how do you build an Indian version of this, et cetera?
[00:41:39.640 --> 00:41:42.800]   - And so do you think that specialized models
[00:41:42.800 --> 00:41:45.880]   will exist and thrive?
[00:41:45.880 --> 00:41:50.000]   Or do you think eventually these converge
[00:41:50.000 --> 00:41:52.080]   on large multimodal models?
[00:41:52.080 --> 00:41:53.480]   I guess in other words, like,
[00:41:53.480 --> 00:41:55.400]   will there be significant leading models
[00:41:55.400 --> 00:41:58.920]   for just text and language and just images and proteins?
[00:42:00.280 --> 00:42:03.800]   Or will there be one large model that's most effective?
[00:42:03.800 --> 00:42:04.840]   - I think it'll be a mixture.
[00:42:04.840 --> 00:42:06.920]   I think every, if you kind of look at it,
[00:42:06.920 --> 00:42:08.600]   what happened is the model's got bigger and bigger
[00:42:08.600 --> 00:42:10.600]   and bigger, trillions of parts.
[00:42:10.600 --> 00:42:11.440]   - Great.
[00:42:11.440 --> 00:42:13.200]   - Unwieldy and unaccessible for anyone.
[00:42:13.200 --> 00:42:16.080]   Then it turned out that actually we weren't paying enough
[00:42:16.080 --> 00:42:17.320]   attention to the data.
[00:42:17.320 --> 00:42:19.560]   So DeepMind released something called Chinchilla,
[00:42:19.560 --> 00:42:23.720]   which was a version of GPT-3, which is 175 billion parameters,
[00:42:23.720 --> 00:42:26.680]   but in 67 billion parameters, the app performed it.
[00:42:26.680 --> 00:42:28.240]   'Cause they just trained it longer.
[00:42:28.240 --> 00:42:30.080]   But if you look at the actual import of that paper,
[00:42:30.080 --> 00:42:33.600]   it was just you need better data and training longer.
[00:42:33.600 --> 00:42:35.440]   So we don't really know how to optimize these models
[00:42:35.440 --> 00:42:38.240]   'cause they were so big and so compute intensive,
[00:42:38.240 --> 00:42:39.640]   'cause they cost millions of dollars each
[00:42:39.640 --> 00:42:42.400]   that we didn't really see what the differentials are
[00:42:42.400 --> 00:42:43.880]   from data from training and others.
[00:42:43.880 --> 00:42:45.720]   There's a lot of model optimization to go.
[00:42:45.720 --> 00:42:47.040]   One of the big breakthroughs now though,
[00:42:47.040 --> 00:42:49.440]   is that we've moved from just deep learning.
[00:42:49.440 --> 00:42:52.120]   So these big supercomputers squishing the data,
[00:42:52.120 --> 00:42:54.240]   moving it back and forth across all these ships,
[00:42:54.240 --> 00:42:56.120]   to now introducing reinforcement learning
[00:42:56.120 --> 00:42:57.520]   with human feedback.
[00:42:57.520 --> 00:42:58.960]   That's where you see how these models
[00:42:58.960 --> 00:43:00.440]   have all the little neurons,
[00:43:00.440 --> 00:43:01.800]   which are these principles they've learned,
[00:43:01.800 --> 00:43:04.600]   light up, when humans actually interact with them.
[00:43:04.600 --> 00:43:06.280]   And you use that to create more specific,
[00:43:06.280 --> 00:43:07.680]   optimized models.
[00:43:07.680 --> 00:43:11.640]   So OpenAI, again, like the leaders in this field,
[00:43:11.640 --> 00:43:15.320]   they created and struct GPT by figuring that out
[00:43:15.320 --> 00:43:16.320]   from GPT-3.
[00:43:16.320 --> 00:43:18.280]   It went from 175 billion parameters
[00:43:18.280 --> 00:43:22.280]   to 1.3 billion parameters, with just as much performance.
[00:43:22.280 --> 00:43:23.800]   And this is one of the things that I think
[00:43:23.800 --> 00:43:26.040]   will really drive it, the combination of deep learning
[00:43:26.040 --> 00:43:27.880]   and understanding how humans interact with these models
[00:43:27.880 --> 00:43:29.600]   to make better models,
[00:43:29.600 --> 00:43:33.040]   and to get better data to build better models again.
[00:43:33.040 --> 00:43:35.640]   We're reaching that point now of rapid integration and feedback,
[00:43:35.640 --> 00:43:37.440]   and that's what we saw with stable diffusion.
[00:43:37.440 --> 00:43:39.800]   We've got 100 times speed up.
[00:43:39.800 --> 00:43:41.560]   So it's 100 times faster than Dali2,
[00:43:41.560 --> 00:43:43.640]   as I would like now, basically.
[00:43:43.640 --> 00:43:45.640]   We'll release it shortly.
[00:43:45.640 --> 00:43:47.080]   But there's more to go.
[00:43:47.080 --> 00:43:48.720]   And so I think, again,
[00:43:48.720 --> 00:43:50.400]   now this really makes sense, to be honest,
[00:43:50.400 --> 00:43:53.720]   Logan, like the fact that a 1.6 gigabyte file
[00:43:53.720 --> 00:43:56.240]   can contain two billion different concepts
[00:43:56.240 --> 00:43:57.760]   and create just about anything now,
[00:43:57.760 --> 00:44:01.000]   pretty much photo realistic, doesn't make sense.
[00:44:01.000 --> 00:44:02.120]   It's insane.
[00:44:02.120 --> 00:44:02.960]   But it's there.
[00:44:02.960 --> 00:44:04.800]   And the fact that it can run on iPhone or that internet
[00:44:04.800 --> 00:44:06.240]   doesn't make sense.
[00:44:06.240 --> 00:44:08.240]   You know, it's orders of magnitude better.
[00:44:08.240 --> 00:44:09.600]   And the question is,
[00:44:09.600 --> 00:44:11.400]   how are we gonna react to this when it happens
[00:44:11.400 --> 00:44:12.920]   for every modality?
[00:44:12.920 --> 00:44:14.640]   'Cause some modalities are more difficult than others,
[00:44:14.640 --> 00:44:16.800]   like 3D is very difficult.
[00:44:16.800 --> 00:44:18.680]   But maybe they'll figure out how to make it easy.
[00:44:18.680 --> 00:44:19.600]   I don't know.
[00:44:19.600 --> 00:44:21.120]   You know, video is difficult.
[00:44:21.120 --> 00:44:22.800]   We're figuring out how to make it easier.
[00:44:22.800 --> 00:44:27.240]   I would note that 80% of all AI researchers in this area
[00:44:27.920 --> 00:44:29.800]   and I reckon $100 billion of investment
[00:44:29.800 --> 00:44:32.640]   will go into this area to accelerate it even more.
[00:44:32.640 --> 00:44:37.280]   - And as you think about your core use case and customers,
[00:44:37.280 --> 00:44:40.040]   you mentioned closer to the Palantir model,
[00:44:40.040 --> 00:44:43.640]   do you think it's mostly gonna be focused on media
[00:44:43.640 --> 00:44:46.400]   and content are gonna be kind of the core businesses
[00:44:46.400 --> 00:44:49.400]   that you service or are there other use cases
[00:44:49.400 --> 00:44:51.840]   or target customers that you're excited about?
[00:44:51.840 --> 00:44:53.200]   - I mean, like I think we can disrupt
[00:44:53.200 --> 00:44:55.800]   the whole of pharmaceutical industry and healthcare.
[00:44:56.800 --> 00:44:58.720]   And again, it's an area that we know very well
[00:44:58.720 --> 00:45:00.360]   and we've got protein folding and other things,
[00:45:00.360 --> 00:45:01.600]   but these models can be applied
[00:45:01.600 --> 00:45:03.760]   to save billions of dollars there.
[00:45:03.760 --> 00:45:05.080]   - And why is that the case?
[00:45:05.080 --> 00:45:09.560]   - Well, because the whole area is kind of...
[00:45:09.560 --> 00:45:13.560]   Classical systems are a girdic.
[00:45:13.560 --> 00:45:16.880]   So they treat everyone like distributed normal,
[00:45:16.880 --> 00:45:17.960]   like that, right?
[00:45:17.960 --> 00:45:19.560]   It's like a thousand tosses of a coin
[00:45:19.560 --> 00:45:21.200]   is the same as one coin toss a thousand times
[00:45:21.200 --> 00:45:22.960]   that people are very individualized.
[00:45:22.960 --> 00:45:24.280]   We didn't have the systems to be able
[00:45:24.280 --> 00:45:26.920]   to have personalization and understanding of principles
[00:45:26.920 --> 00:45:28.320]   and we've got them both at the same time now
[00:45:28.320 --> 00:45:29.560]   with this technology.
[00:45:29.560 --> 00:45:31.960]   When you apply this appropriately to healthcare and bio,
[00:45:31.960 --> 00:45:33.040]   you have massive breakthroughs
[00:45:33.040 --> 00:45:34.400]   and there are drug development companies
[00:45:34.400 --> 00:45:35.560]   and others doing that,
[00:45:35.560 --> 00:45:37.080]   but they're all building their own infrastructure
[00:45:37.080 --> 00:45:39.520]   when they should be using a unified infrastructure.
[00:45:39.520 --> 00:45:41.960]   And this is where you open source is super powerful
[00:45:41.960 --> 00:45:43.400]   'cause I'll give you an example,
[00:45:43.400 --> 00:45:45.560]   by releasing our model and having the traction,
[00:45:45.560 --> 00:45:48.440]   Apple optimized it for the neural engine on the M1
[00:45:48.440 --> 00:45:50.040]   and the other architectures.
[00:45:50.040 --> 00:45:53.840]   It's the first model ever to basically have that.
[00:45:54.320 --> 00:45:55.600]   You can set the standards around this
[00:45:55.600 --> 00:45:57.560]   and you can really drive forward the sectors.
[00:45:57.560 --> 00:45:59.320]   And from a business perspective, again,
[00:45:59.320 --> 00:46:00.720]   we are the business that does this.
[00:46:00.720 --> 00:46:02.640]   There is nobody else.
[00:46:02.640 --> 00:46:03.480]   That's multimodal.
[00:46:03.480 --> 00:46:06.440]   There is nobody else that is media focused initially.
[00:46:06.440 --> 00:46:09.840]   So media is our number one thing, media and video games.
[00:46:09.840 --> 00:46:11.480]   And so if anyone wants to have the best,
[00:46:11.480 --> 00:46:12.600]   they come to us.
[00:46:12.600 --> 00:46:14.200]   But we can only work with a few entities,
[00:46:14.200 --> 00:46:16.200]   but that's okay because we can have an entire
[00:46:16.200 --> 00:46:17.520]   massive business based on that,
[00:46:17.520 --> 00:46:20.200]   just like Google has a great business based on that.
[00:46:20.200 --> 00:46:22.040]   You know, these are the sectors.
[00:46:22.040 --> 00:46:23.600]   We can do the sectors one by one by one,
[00:46:23.600 --> 00:46:25.760]   but I don't see any sector that's not affected
[00:46:25.760 --> 00:46:27.240]   by this technology.
[00:46:27.240 --> 00:46:30.120]   The only question is for us as a business,
[00:46:30.120 --> 00:46:32.520]   who do we work with partners to go out to
[00:46:32.520 --> 00:46:35.040]   with this infrastructure and who do we do ourselves?
[00:46:35.040 --> 00:46:37.560]   I mean, it's kind of like what the promise of web three was
[00:46:37.560 --> 00:46:40.480]   many years ago, but web three was always an economic incentive
[00:46:40.480 --> 00:46:43.400]   that was trying to be bootstrapped to real life use cases
[00:46:43.400 --> 00:46:46.120]   whereas this is real life use cases right now.
[00:46:46.120 --> 00:46:48.080]   There's a living value right now.
[00:46:48.080 --> 00:46:49.440]   I think that's why it's exploded.
[00:46:49.440 --> 00:46:51.680]   And next year we'll go even bigger.
[00:46:51.680 --> 00:46:54.840]   One of the interesting elements of this is it's really coming,
[00:46:54.840 --> 00:46:59.320]   I mean, historically automation and AI has been assumed
[00:46:59.320 --> 00:47:02.080]   to serve at the simpler levels, right?
[00:47:02.080 --> 00:47:04.560]   And move their way up or the more manual levels
[00:47:04.560 --> 00:47:07.120]   and move their way up the stack.
[00:47:07.120 --> 00:47:10.560]   This is actually fundamentally shifting that paradigm
[00:47:10.560 --> 00:47:14.120]   and coming at it from the creative,
[00:47:14.120 --> 00:47:19.120]   more knowledge worker based systems and processes
[00:47:19.680 --> 00:47:23.600]   and actually potentially automating away a bunch
[00:47:23.600 --> 00:47:25.680]   of those different jobs.
[00:47:25.680 --> 00:47:28.600]   When you think about like no industry is not going to be
[00:47:28.600 --> 00:47:31.800]   impacted by this in some way.
[00:47:31.800 --> 00:47:34.120]   I'm just fascinated to hear you riff on some
[00:47:34.120 --> 00:47:36.000]   of the different use cases or industries.
[00:47:36.000 --> 00:47:39.680]   Have you seen it applied in any ways
[00:47:39.680 --> 00:47:42.360]   that were unexpected to you since this has been out
[00:47:42.360 --> 00:47:45.320]   in the world in August that would be tangible
[00:47:45.320 --> 00:47:46.800]   for people to hear about?
[00:47:47.920 --> 00:47:50.080]   I mean, it's been applied in crazy ways.
[00:47:50.080 --> 00:47:54.560]   People who use it to create 3D VR simulations instantly.
[00:47:54.560 --> 00:47:58.520]   It was used to create synthetic data on lung scans
[00:47:58.520 --> 00:48:01.200]   to identify cancer by Stanford AMI
[00:48:01.200 --> 00:48:03.880]   because they didn't have enough data sets
[00:48:03.880 --> 00:48:06.040]   and they created more data sets like the ones they did.
[00:48:06.040 --> 00:48:08.240]   Yesterday there was something called refusion
[00:48:08.240 --> 00:48:12.120]   whereby they took spectrograms of music
[00:48:12.120 --> 00:48:13.600]   and trained it on that.
[00:48:13.600 --> 00:48:15.520]   And now from those spectrograms, you know,
[00:48:15.520 --> 00:48:17.840]   the little things, it can generate brand new music
[00:48:17.840 --> 00:48:22.320]   of any type, which is a bit crazy.
[00:48:22.320 --> 00:48:25.040]   So I think that nobody really knows
[00:48:25.040 --> 00:48:26.480]   what the long-term implications of this are.
[00:48:26.480 --> 00:48:28.280]   A lot of people think that it should just be words
[00:48:28.280 --> 00:48:31.560]   in images out or words in text out,
[00:48:31.560 --> 00:48:32.920]   but the real impact's going to come
[00:48:32.920 --> 00:48:34.880]   when people really sit down and think about
[00:48:34.880 --> 00:48:37.080]   which parts of the creative process,
[00:48:37.080 --> 00:48:40.400]   the constructive process, my office process,
[00:48:40.400 --> 00:48:43.480]   would a little bit of some sort of entity
[00:48:43.480 --> 00:48:45.280]   that understands the nature between structure
[00:48:45.280 --> 00:48:47.720]   and unstructured data and the barriers between that
[00:48:47.720 --> 00:48:49.840]   and blurring that be super useful.
[00:48:49.840 --> 00:48:52.440]   And I can't really think of many things
[00:48:52.440 --> 00:48:55.840]   that aren't disrupted by that in the knowledge workspace.
[00:48:55.840 --> 00:48:58.240]   In the manual workspace, it's more difficult
[00:48:58.240 --> 00:49:00.080]   'cause you needed to have robotics and things like that,
[00:49:00.080 --> 00:49:01.520]   high capex, right?
[00:49:01.520 --> 00:49:05.320]   There is no capex required to do this at a base level.
[00:49:05.320 --> 00:49:06.760]   When you wanna build your own custom models,
[00:49:06.760 --> 00:49:08.480]   yeah, it costs millions of dollars.
[00:49:08.480 --> 00:49:10.160]   But only a few companies will do that,
[00:49:10.160 --> 00:49:12.160]   which is why our focus is on a few companies.
[00:49:12.160 --> 00:49:15.240]   And for everyone else, it's just making these models usable.
[00:49:15.240 --> 00:49:18.560]   - What concerns you the most about all of this
[00:49:18.560 --> 00:49:22.840]   being out in the world now from a societal impact standpoint?
[00:49:22.840 --> 00:49:24.600]   - I mean, look at the unknowns, right?
[00:49:24.600 --> 00:49:26.360]   Nobody really knows what's gonna happen.
[00:49:26.360 --> 00:49:29.080]   The bad guys already have this technology.
[00:49:29.080 --> 00:49:31.160]   Should know for a number of reasons, right?
[00:49:31.160 --> 00:49:34.040]   And I don't know if we'll be able to catch up
[00:49:34.040 --> 00:49:36.840]   enough with a society to the bad actors
[00:49:36.840 --> 00:49:39.680]   who have better versions of this technology,
[00:49:39.680 --> 00:49:42.680]   but then also some of these knock on effects,
[00:49:42.680 --> 00:49:44.920]   like, you know, anyone can create anything,
[00:49:44.920 --> 00:49:47.000]   anyone can write anything.
[00:49:47.000 --> 00:49:49.840]   Instantly, like, there's no more barriers to this.
[00:49:49.840 --> 00:49:51.400]   What are the knock on knock on effects on this?
[00:49:51.400 --> 00:49:53.240]   I don't know, nobody knows.
[00:49:53.240 --> 00:49:54.840]   So this thing that worries me a lot,
[00:49:54.840 --> 00:49:56.560]   but then the other part of me is like,
[00:49:56.560 --> 00:49:59.680]   the alternative is this technology is only controlled
[00:49:59.680 --> 00:50:02.680]   by large organizations and they're full of good people,
[00:50:02.680 --> 00:50:04.320]   but they do bad things,
[00:50:04.320 --> 00:50:06.680]   and they will use it to serve us more ads.
[00:50:06.680 --> 00:50:08.920]   Or we could use it to activate humanity's potential
[00:50:08.920 --> 00:50:11.480]   by bringing this to the world and having an open debate.
[00:50:11.480 --> 00:50:14.400]   And so when challenges come up, we can react to it together.
[00:50:14.960 --> 00:50:16.320]   You know?
[00:50:16.320 --> 00:50:18.200]   So that's how I try to mitigate against that.
[00:50:18.200 --> 00:50:20.440]   That's kind of my approach of philosophy.
[00:50:20.440 --> 00:50:25.520]   Who's the more forward thinking in terms of the risks
[00:50:25.520 --> 00:50:27.960]   and trade-offs around all of this?
[00:50:27.960 --> 00:50:31.480]   Because as you mentioned earlier, ethics, morality, laws,
[00:50:31.480 --> 00:50:34.200]   all of those things are very, very different.
[00:50:34.200 --> 00:50:38.640]   And what UK versus US, I think I've heard you reference,
[00:50:38.640 --> 00:50:40.440]   like, there's no absolute moral framework
[00:50:40.440 --> 00:50:41.480]   for things in the world.
[00:50:41.480 --> 00:50:46.480]   And like most people that do harm or ill as we view it,
[00:50:46.480 --> 00:50:49.600]   they can talk themselves into anything
[00:50:49.600 --> 00:50:51.200]   and believe in anything, right?
[00:50:51.200 --> 00:50:54.320]   So I guess I'm interested in who's at the forefront
[00:50:54.320 --> 00:50:57.800]   of thinking through some of these things
[00:50:57.800 --> 00:51:02.320]   and what's gonna be the governing body in your mind
[00:51:02.320 --> 00:51:06.000]   that actually comes to these types of decisions?
[00:51:06.000 --> 00:51:07.640]   - I've been very disappointed.
[00:51:07.640 --> 00:51:10.160]   I've not met anyone who's really thought this through.
[00:51:10.160 --> 00:51:13.760]   The classical thing is either massive techno optimism.
[00:51:13.760 --> 00:51:16.680]   Everything will be absolutely fine or massive
[00:51:16.680 --> 00:51:17.600]   ultra orthodoxy.
[00:51:17.600 --> 00:51:20.480]   This technology is too dangerous to ever release.
[00:51:20.480 --> 00:51:23.040]   There are very few people who are kind of stringing
[00:51:23.040 --> 00:51:24.200]   the line between.
[00:51:24.200 --> 00:51:26.280]   I think UK government's probably the most forward thinking
[00:51:26.280 --> 00:51:27.120]   on this.
[00:51:27.120 --> 00:51:29.480]   The European government is the most regressive.
[00:51:29.480 --> 00:51:30.960]   They're looking to ban general purpose,
[00:51:30.960 --> 00:51:33.960]   artificial intelligence and be the regulatory leaders,
[00:51:33.960 --> 00:51:34.880]   which is stupid.
[00:51:34.880 --> 00:51:36.840]   Europeans will fall behind.
[00:51:36.840 --> 00:51:39.560]   The US is trying to figure out where it stands on this.
[00:51:40.280 --> 00:51:41.480]   In terms of governments.
[00:51:41.480 --> 00:51:42.680]   Like I said, even on the individuals,
[00:51:42.680 --> 00:51:45.680]   I just think that it's really complex.
[00:51:45.680 --> 00:51:47.160]   And I can't think of a government structure
[00:51:47.160 --> 00:51:48.000]   that can handle this.
[00:51:48.000 --> 00:51:49.040]   'Cause one of the questions is like,
[00:51:49.040 --> 00:51:50.520]   do we give this to the Linux Foundation
[00:51:50.520 --> 00:51:52.400]   or something like that for the decisions?
[00:51:52.400 --> 00:51:53.880]   Not really.
[00:51:53.880 --> 00:51:55.000]   These are do no-voe things
[00:51:55.000 --> 00:51:58.200]   and you have to kind of make decisions around this.
[00:51:58.200 --> 00:52:01.480]   But I think, you know, a lot of the press against us,
[00:52:01.480 --> 00:52:03.400]   we had a lot of positive press, a lot of negative press,
[00:52:03.400 --> 00:52:04.240]   but at least it's out there.
[00:52:04.240 --> 00:52:05.760]   And I like that it's out there
[00:52:05.760 --> 00:52:07.280]   because it means that people are having
[00:52:07.280 --> 00:52:08.600]   really strong discussions.
[00:52:08.600 --> 00:52:10.600]   I think we need to have more structured forums
[00:52:10.600 --> 00:52:12.640]   to have these discussions in a proper way,
[00:52:12.640 --> 00:52:14.120]   not an emotional way.
[00:52:14.120 --> 00:52:15.360]   And I think that will happen,
[00:52:15.360 --> 00:52:17.160]   hopefully as these things get more exponential
[00:52:17.160 --> 00:52:18.440]   and we will host some of them
[00:52:18.440 --> 00:52:21.520]   and we'll invite as many people as we can into this.
[00:52:21.520 --> 00:52:23.280]   There's also, like I said, our communities
[00:52:23.280 --> 00:52:25.320]   around language, around bio-mail and others
[00:52:25.320 --> 00:52:27.360]   we're spinning into independent foundations
[00:52:27.360 --> 00:52:29.000]   to handle small parts of this.
[00:52:29.000 --> 00:52:32.400]   So that I'm inviting everyone in on that.
[00:52:32.400 --> 00:52:34.240]   So we shouldn't be making these decisions
[00:52:34.240 --> 00:52:35.680]   and I shouldn't anyone else
[00:52:35.680 --> 00:52:38.560]   about what a benchmark model is.
[00:52:38.560 --> 00:52:41.120]   But then for the overall guiding thing,
[00:52:41.120 --> 00:52:43.760]   yeah, nobody really knows, unfortunately.
[00:52:43.760 --> 00:52:45.520]   - The last four months of your life
[00:52:45.520 --> 00:52:47.120]   or five months of your life or whatever it's been,
[00:52:47.120 --> 00:52:49.840]   I'm sure have been pretty unusual.
[00:52:49.840 --> 00:52:51.880]   What's it been like at a personal level?
[00:52:51.880 --> 00:52:56.800]   One, to have the largest open source community in history
[00:52:56.800 --> 00:52:59.680]   or at least trending in that way, as well as,
[00:52:59.680 --> 00:53:02.280]   I mean, now you've become something
[00:53:02.280 --> 00:53:05.760]   of more of a public figure than you were in the past.
[00:53:05.760 --> 00:53:08.240]   What has this been at a personal level
[00:53:08.240 --> 00:53:10.280]   managing those two things?
[00:53:10.280 --> 00:53:12.800]   - It's been really tiring, stressful.
[00:53:12.800 --> 00:53:14.440]   I mean, look, I never wanted to be a public figure.
[00:53:14.440 --> 00:53:17.160]   I've asked Virgin ADHD, I hate public.
[00:53:17.160 --> 00:53:19.520]   But like this needed to have a figurehead
[00:53:19.520 --> 00:53:21.200]   and someone to lay the blame on.
[00:53:21.200 --> 00:53:23.200]   And I also got the positives from that, you know,
[00:53:23.200 --> 00:53:24.680]   built a great company.
[00:53:24.680 --> 00:53:26.400]   I had lots of failures in the past.
[00:53:26.400 --> 00:53:28.560]   I'm just trying to do my best
[00:53:28.560 --> 00:53:31.040]   because unfortunately a lot of this stuff keeps centralizing
[00:53:31.040 --> 00:53:33.040]   so I keep trying to give away a lot of authority
[00:53:33.040 --> 00:53:34.480]   and it comes back to me.
[00:53:34.480 --> 00:53:37.440]   And that's really heavy and it's heavy burden to bear.
[00:53:37.440 --> 00:53:39.160]   But like I said, there are positives
[00:53:39.160 --> 00:53:40.240]   that counterbalance that.
[00:53:40.240 --> 00:53:41.320]   I hope I do the right thing.
[00:53:41.320 --> 00:53:43.680]   But the amazing thing now is that
[00:53:43.680 --> 00:53:45.920]   some of the smartest and best people in the world
[00:53:45.920 --> 00:53:47.720]   in various sectors are reaching out to us
[00:53:47.720 --> 00:53:49.240]   and joining stability.
[00:53:49.240 --> 00:53:51.000]   So I think if we improve as an organization
[00:53:51.000 --> 00:53:53.000]   and build a great organization people can be part of,
[00:53:53.000 --> 00:53:55.120]   you know, like Google in 2012,
[00:53:55.120 --> 00:53:57.440]   then maybe this can be dispersed
[00:53:57.440 --> 00:53:59.080]   amongst really intelligent people
[00:53:59.080 --> 00:54:01.280]   who've got good hearts working in the right way.
[00:54:01.280 --> 00:54:03.280]   I do think we need to be more transparent as well.
[00:54:03.280 --> 00:54:05.720]   Like there's a tendency to keep things closed
[00:54:05.720 --> 00:54:08.160]   for a variety of reasons.
[00:54:08.160 --> 00:54:09.760]   And so I hope we can become a really transparent,
[00:54:09.760 --> 00:54:11.720]   great organization full of great people.
[00:54:11.720 --> 00:54:12.960]   'Cause then I can go and finish
[00:54:12.960 --> 00:54:14.920]   like my video games and things like that.
[00:54:14.920 --> 00:54:17.400]   And then take a back seat.
[00:54:17.400 --> 00:54:21.240]   - Who do you turn to for, I'm sure you have tons
[00:54:21.240 --> 00:54:24.080]   of people willing to offer advice or opinions.
[00:54:24.080 --> 00:54:26.360]   Like what is your, the group of people
[00:54:26.360 --> 00:54:29.640]   you keep around the table that keep you level headed
[00:54:29.640 --> 00:54:32.560]   and balanced and help you drive towards the right
[00:54:32.560 --> 00:54:35.520]   or the star or does it inevitably end up being just
[00:54:35.520 --> 00:54:37.520]   your gut at the end of the day
[00:54:37.520 --> 00:54:41.280]   in terms of what good actually means in this world
[00:54:41.280 --> 00:54:44.120]   of no absolute moral framework around that stuff.
[00:54:44.120 --> 00:54:46.200]   - I mean, that's a complex thing, right?
[00:54:46.200 --> 00:54:47.680]   Kind of getting into metaphysics and things like that.
[00:54:47.680 --> 00:54:49.240]   Now I've got my friends, you know,
[00:54:49.240 --> 00:54:51.600]   put a board together because they can give me advice,
[00:54:51.600 --> 00:54:55.360]   got some excellent people like Shree and Jim on there
[00:54:55.360 --> 00:54:56.560]   after the business side of things.
[00:54:56.560 --> 00:54:58.160]   And it's the team as well.
[00:54:58.160 --> 00:55:00.120]   Again, I need to communicate better with the team,
[00:55:00.120 --> 00:55:03.280]   but they tell me very directly when I'm being stupid
[00:55:03.280 --> 00:55:07.440]   or overall, or, you know, sometimes a CEO over heaps things
[00:55:07.440 --> 00:55:09.400]   'cause he gets excited.
[00:55:09.400 --> 00:55:11.160]   So there's plenty of checks and balances
[00:55:11.160 --> 00:55:13.760]   'cause also I'm quite an approachable person, I hope.
[00:55:13.760 --> 00:55:18.000]   But yeah, like I said, ultimately,
[00:55:18.000 --> 00:55:20.240]   founder led companies can only sustain for so long
[00:55:20.240 --> 00:55:21.560]   and we're in this transition point now
[00:55:21.560 --> 00:55:23.320]   where we're gonna become a process driven company
[00:55:23.320 --> 00:55:25.640]   and we have to, not from a business perspective,
[00:55:25.640 --> 00:55:27.440]   but also from an important perspective,
[00:55:27.440 --> 00:55:29.240]   given the impact of what we're doing
[00:55:29.240 --> 00:55:31.000]   and our place in the ecosystem.
[00:55:31.000 --> 00:55:34.400]   Like again, build a great company to help a billion people,
[00:55:34.400 --> 00:55:37.520]   but at the same time, this is an important point in humanity
[00:55:37.520 --> 00:55:39.120]   given this technology, like I said,
[00:55:39.120 --> 00:55:41.040]   next year is gonna take off everywhere.
[00:55:41.040 --> 00:55:43.120]   Like every single graphic designer
[00:55:43.120 --> 00:55:44.800]   and every single person will use stable diffusion
[00:55:44.800 --> 00:55:46.360]   in some way or another.
[00:55:46.360 --> 00:55:48.320]   Every single person doing their homework
[00:55:48.320 --> 00:55:49.880]   will use chat GPT.
[00:55:49.880 --> 00:55:52.280]   You know, these are big changes that are coming through.
[00:55:52.280 --> 00:55:56.040]   And it's been building, I think you've seen it building,
[00:55:56.040 --> 00:55:58.640]   but now it's this breakthrough moment that's occurring.
[00:55:58.640 --> 00:56:02.400]   - Why do you pick next year as the time
[00:56:02.400 --> 00:56:03.760]   that this is all gonna happen?
[00:56:03.760 --> 00:56:07.000]   Is it just extrapolating the exponential growth curves?
[00:56:07.000 --> 00:56:10.440]   - Real time stable diffusion, apps like Lenza
[00:56:10.440 --> 00:56:12.360]   hitting number one on the App Store,
[00:56:12.360 --> 00:56:15.600]   showing the value of creativity, like 500 million run rate.
[00:56:15.600 --> 00:56:18.440]   And like I said, the final element to that was chat GPT,
[00:56:18.440 --> 00:56:20.120]   whereby every single smart kid I know now
[00:56:20.120 --> 00:56:23.680]   is using it to do their homework, at least 80%, right?
[00:56:23.680 --> 00:56:26.160]   Like it's good enough, fast enough, cheap enough.
[00:56:26.160 --> 00:56:27.840]   And that is the take off point.
[00:56:27.840 --> 00:56:30.320]   - Great, well, and my thanks for coming on and doing this.
[00:56:30.320 --> 00:56:31.520]   This is super fun.
[00:56:31.520 --> 00:56:33.920]   It gives us a lot to think about.
[00:56:33.920 --> 00:56:37.480]   So thanks for coming in and answering all those questions.
[00:56:37.480 --> 00:56:38.800]   - My pleasure, thanks for having me.
[00:56:38.800 --> 00:56:39.960]   Cheers, you take care.
[00:56:39.960 --> 00:56:42.560]   (upbeat music)
[00:56:42.560 --> 00:56:47.760]   - So that'll do it for the 46th episode of Cartoon Avatars.
[00:56:47.760 --> 00:56:51.640]   Thank you to my stock for coming on
[00:56:51.640 --> 00:56:54.160]   and having that conversation.
[00:56:54.160 --> 00:56:58.000]   And thank you to Andrew, Justin, Jenny, Rashad,
[00:56:58.000 --> 00:57:01.200]   and everyone that helped out on this episode.
[00:57:01.200 --> 00:57:05.280]   Look forward to hearing, to seeing everyone next week
[00:57:05.280 --> 00:57:06.920]   on the 47th episode.
[00:57:06.920 --> 00:57:09.120]   We have a guest that I've long admired
[00:57:09.120 --> 00:57:11.800]   and someone that I was definitely excited to have on.
[00:57:11.800 --> 00:57:15.200]   And so you'll definitely enjoy that one as well.
[00:57:15.200 --> 00:57:16.600]   Thanks everyone for listening.
[00:57:16.600 --> 00:57:19.180]   (upbeat music)
[00:57:19.180 --> 00:57:21.760]   (upbeat music)
[00:57:21.760 --> 00:57:31.760]   (upbeat music)
[00:57:31.760 --> 00:57:34.340]   (upbeat music)


What is swyx backup pod?

swyx backup pod