https://www.listennotes.com/podcasts/the-logan-bartlett/ep-46-stability-ai-ceo-emad-8PQIYcR3r2i/
[00:00:00.000 --> 00:00:02.580] (upbeat music)
[00:00:02.580 --> 00:00:07.560] - Welcome to the 46th episode of Cartoon Avatars.
[00:00:07.560 --> 00:00:09.840] I am your host Logan Bartlett.
[00:00:09.840 --> 00:00:11.960] Welcome back for break.
[00:00:11.960 --> 00:00:13.280] Thanks everyone for bearing with us
[00:00:13.280 --> 00:00:16.160] as we took a pause over the last couple of weeks.
[00:00:16.160 --> 00:00:18.200] We're excited for this episode.
[00:00:18.200 --> 00:00:19.880] This, what you're gonna hear on this episode
[00:00:19.880 --> 00:00:22.480] is a conversation that I had with the Mod in the Stock.
[00:00:22.480 --> 00:00:27.400] And Mod is the founder and CEO of Stable Stability AI,
[00:00:27.400 --> 00:00:29.840] which is the largest contributor to Stable Diffusion.
[00:00:29.840 --> 00:00:32.280] Stable Diffusion is the fastest growing
[00:00:32.280 --> 00:00:34.120] open source project of all time.
[00:00:34.120 --> 00:00:39.080] It's one of the leading platforms in generative AI.
[00:00:39.080 --> 00:00:41.720] And Ema and I had a really interesting conversation
[00:00:41.720 --> 00:00:43.080] about a bunch of different things,
[00:00:43.080 --> 00:00:47.720] but we dive into the state of artificial intelligence today,
[00:00:47.720 --> 00:00:50.920] why this is possible, when it wasn't in the past,
[00:00:50.920 --> 00:00:53.160] where this is going in the future,
[00:00:53.160 --> 00:00:56.760] how he differentiates versus competitors like OpenAI.
[00:00:56.760 --> 00:00:59.560] Really fun conversation and appreciate him
[00:00:59.560 --> 00:01:01.120] for powering through.
[00:01:01.120 --> 00:01:02.760] He was a little sick as we were doing this.
[00:01:02.760 --> 00:01:07.760] So it was a fun conversation and I appreciate him doing it with me.
[00:01:07.760 --> 00:01:09.840] And so before you hear that,
[00:01:09.840 --> 00:01:11.560] we talked a little bit about this before break,
[00:01:11.560 --> 00:01:13.240] but we are gonna make a more concerted effort
[00:01:13.240 --> 00:01:16.440] to get people to like and subscribe
[00:01:16.440 --> 00:01:20.240] and share and review the podcast itself.
[00:01:20.240 --> 00:01:22.560] And so if you're whatever platform you're listening on,
[00:01:22.560 --> 00:01:25.720] if it's YouTube, if it's Spotify, if it's Apple,
[00:01:25.720 --> 00:01:29.040] whatever it is, if people could go ahead and like
[00:01:29.040 --> 00:01:31.640] and subscribe and leave a review,
[00:01:31.640 --> 00:01:33.880] share with a friend, all of that stuff.
[00:01:33.880 --> 00:01:35.480] We're trying to figure out exactly what direction
[00:01:35.480 --> 00:01:36.640] to take this in.
[00:01:36.640 --> 00:01:40.080] And so that validation and feedback
[00:01:40.080 --> 00:01:42.520] and also the growth that comes along with all that stuff
[00:01:42.520 --> 00:01:44.880] is super appreciated.
[00:01:44.880 --> 00:01:46.880] It's not something we had been comfortable
[00:01:46.880 --> 00:01:48.240] asking for to date,
[00:01:48.240 --> 00:01:51.080] but as we kind of figure out what direction we're gonna go,
[00:01:51.080 --> 00:01:54.840] we'd love to see more shares, more reviews, more views,
[00:01:54.840 --> 00:01:56.200] more likes, all that stuff.
[00:01:56.200 --> 00:02:00.520] So really appreciate everyone's support in doing that.
[00:02:00.520 --> 00:02:02.520] And so without further delay,
[00:02:02.520 --> 00:02:04.280] what you're gonna hear now is the conversation with me
[00:02:04.280 --> 00:02:06.880] and I'm on Mistock from Stability AI.
[00:02:06.880 --> 00:02:09.920] All right, Iman Mistock.
[00:02:09.920 --> 00:02:10.800] Did I say that right?
[00:02:10.800 --> 00:02:12.120] - Yep. - Perfect.
[00:02:12.120 --> 00:02:13.680] Thank you for doing this.
[00:02:13.680 --> 00:02:16.560] Founder of Stability AI,
[00:02:16.560 --> 00:02:20.480] one of the main contributors to stable diffusion.
[00:02:20.480 --> 00:02:23.480] Thank you for coming on here today.
[00:02:23.480 --> 00:02:24.440] - So pleasure, Logan.
[00:02:24.440 --> 00:02:25.960] Most I have be here.
[00:02:25.960 --> 00:02:26.800] - Yeah, totally.
[00:02:26.800 --> 00:02:30.120] So maybe at a highest level, we can start off with
[00:02:30.120 --> 00:02:32.560] what is generative AI?
[00:02:32.560 --> 00:02:34.760] How would you define that for the average person?
[00:02:34.760 --> 00:02:37.880] - So I think everyone said of kind of the concepts
[00:02:37.880 --> 00:02:40.560] of big data 'cause the whole of the internet previously
[00:02:40.560 --> 00:02:41.680] was on big data.
[00:02:41.680 --> 00:02:44.640] Large, large models built by Google and Facebook
[00:02:44.640 --> 00:02:47.000] and others to basically target you ads
[00:02:47.000 --> 00:02:49.200] ads were the main part of that.
[00:02:49.200 --> 00:02:51.640] And these models extended.
[00:02:51.640 --> 00:02:54.480] So how to generalize model of what a person was like
[00:02:54.480 --> 00:02:56.600] and then your specific interests,
[00:02:56.600 --> 00:03:00.480] like EMAD likes green hoodies or Logan lights black jumpers.
[00:03:00.480 --> 00:03:05.520] That then extended the previous to what the next thing was.
[00:03:05.520 --> 00:03:08.080] They're like extension models inferring what was there.
[00:03:08.080 --> 00:03:09.560] Gerative models are a bit different
[00:03:09.560 --> 00:03:12.160] in that they learn principles from structured
[00:03:12.160 --> 00:03:16.080] and unstructured data and then they can generate new things
[00:03:16.080 --> 00:03:17.400] based on those principles.
[00:03:17.400 --> 00:03:21.680] So you could ask it to write an essay about bubble sort
[00:03:21.680 --> 00:03:24.840] or a solid about Shakespeare or the TTH
[00:03:24.840 --> 00:03:26.840] which is digital so you can do that.
[00:03:26.840 --> 00:03:29.960] Or in the case of some of the work that we're most famous for
[00:03:29.960 --> 00:03:32.800] you enter in a labradoodle with a hat
[00:03:32.800 --> 00:03:34.960] and a stained glass window and it understands that
[00:03:34.960 --> 00:03:37.200] and then creates that in a few seconds.
[00:03:37.200 --> 00:03:39.360] So I'd say that's probably the biggest difference
[00:03:39.360 --> 00:03:40.880] between this new type of generative AI
[00:03:40.880 --> 00:03:43.040] and then that old type of AI.
[00:03:43.040 --> 00:03:44.880] So the way that I also say that we've moved
[00:03:44.880 --> 00:03:47.720] from a big data area to more a big model era
[00:03:47.720 --> 00:03:50.600] 'cause these models are very difficult to create, train
[00:03:50.600 --> 00:03:53.480] which is why only a few companies such as ours do it.
[00:03:53.480 --> 00:03:57.800] - And the key point there is the predictive nature
[00:03:57.800 --> 00:04:01.560] of these models and the ability to actually take
[00:04:01.560 --> 00:04:05.760] not just what was given to it but also act on all
[00:04:05.760 --> 00:04:08.840] the other things to sort of self-create in that way.
[00:04:08.840 --> 00:04:11.280] Or what was the distinction that you were drawing there?
[00:04:11.280 --> 00:04:16.280] - Yeah, so like if you built a dog classifier previously
[00:04:16.280 --> 00:04:19.840] and then a new type of dog species came along
[00:04:20.720 --> 00:04:23.440] then the classifier wouldn't be able to understand it
[00:04:23.440 --> 00:04:25.480] 'cause I didn't really understand the concept of a dog.
[00:04:25.480 --> 00:04:28.600] - It was just responding to all the history of things
[00:04:28.600 --> 00:04:31.240] that it had been fed in terms of, I guess,
[00:04:31.240 --> 00:04:33.280] fed is an interesting analogy for a dog here
[00:04:33.280 --> 00:04:35.880] but all the history of data, they didn't have any
[00:04:35.880 --> 00:04:37.880] understanding that it was a dog, it was just, hey,
[00:04:37.880 --> 00:04:41.320] here's all the parameters around which this thing
[00:04:41.320 --> 00:04:42.160] seems defined.
[00:04:42.160 --> 00:04:43.800] - Is dog-like, yeah.
[00:04:43.800 --> 00:04:46.320] You know, this is one of the issues with self-driving cars.
[00:04:46.320 --> 00:04:49.200] Like you have this whole world of things that you're used to
[00:04:49.200 --> 00:04:50.720] but then what happens if something happens
[00:04:50.720 --> 00:04:52.880] that is not in the training set, right?
[00:04:52.880 --> 00:04:55.360] So in 2017 there was a breakthrough
[00:04:55.360 --> 00:04:57.000] in what's known as deep learning,
[00:04:57.000 --> 00:04:58.800] whether it's just paper attention as all you need
[00:04:58.800 --> 00:05:00.680] about how to get an AI to pay attention
[00:05:00.680 --> 00:05:03.920] to the important things as opposed to just everything.
[00:05:03.920 --> 00:05:05.880] So we moved from just analyzing everything
[00:05:05.880 --> 00:05:08.200] to analyzing the important things.
[00:05:08.200 --> 00:05:10.080] And this has what's led to a lot of the transformative
[00:05:10.080 --> 00:05:13.800] breakthroughs that has allowed AI to get in very narrow areas
[00:05:13.800 --> 00:05:16.280] to human levels of performance in writing, reading,
[00:05:16.280 --> 00:05:19.440] playing Go, playing StarCraft, all sorts of things,
[00:05:19.440 --> 00:05:21.560] protein folding, et cetera as well.
[00:05:21.560 --> 00:05:24.600] Another breakthroughs that are not human as it were.
[00:05:24.600 --> 00:05:29.000] - And so it feels to me and probably the average person
[00:05:29.000 --> 00:05:31.560] that this all came out of nowhere
[00:05:31.560 --> 00:05:35.840] but we've had sort of incremental progress in AI
[00:05:35.840 --> 00:05:38.360] over the course of the last 20 years or so.
[00:05:38.360 --> 00:05:41.760] Can you give a quick primer on like back to the days
[00:05:41.760 --> 00:05:46.360] of Deep Blue and Chas and Deep Mind and Go?
[00:05:46.360 --> 00:05:48.680] Like what have been in your mind
[00:05:48.680 --> 00:05:51.160] the historical points along the way
[00:05:51.160 --> 00:05:53.320] that sort of led to this avalanche
[00:05:53.320 --> 00:05:55.800] that feels like it just has just happened?
[00:05:55.800 --> 00:05:56.880] - Yeah, so like machine learning
[00:05:56.880 --> 00:05:58.320] was kind of the classical paradigm.
[00:05:58.320 --> 00:05:59.480] Actually one way to think about it as well
[00:05:59.480 --> 00:06:00.680] is that you got two parts of your brain,
[00:06:00.680 --> 00:06:02.240] the part of your brain that jumps to conclusions
[00:06:02.240 --> 00:06:03.280] and the logical part.
[00:06:03.280 --> 00:06:05.080] So the world as it is and there's fairly crap
[00:06:05.080 --> 00:06:06.880] there's a tiger in the bush, right?
[00:06:06.880 --> 00:06:10.160] So classical AI was the more logical kind of way
[00:06:10.160 --> 00:06:12.520] and it was based on more and more and more data,
[00:06:12.520 --> 00:06:13.680] again big data.
[00:06:13.680 --> 00:06:16.240] So when Deep Blue beat Gary Kasparov
[00:06:16.240 --> 00:06:18.440] it's because it could think more moves ahead of him.
[00:06:18.440 --> 00:06:22.120] It just did pure crunching of the numbers.
[00:06:22.120 --> 00:06:25.000] It looked at every chess match and then it outperformed him.
[00:06:25.000 --> 00:06:27.200] And eventually people knew it would get to that point
[00:06:27.200 --> 00:06:29.240] but they didn't think it happened quite so quickly.
[00:06:29.240 --> 00:06:32.440] - And that was in 1996, '97-ish
[00:06:32.440 --> 00:06:35.920] and basically chess is kind of a constrained game
[00:06:35.920 --> 00:06:36.760] in a lot of ways.
[00:06:36.760 --> 00:06:39.000] Like there's only so many moves that can be made
[00:06:39.000 --> 00:06:41.440] and so you can copy occasionally run through the history
[00:06:41.440 --> 00:06:42.960] of all of the moves and figure out
[00:06:42.960 --> 00:06:44.680] what the next best action is.
[00:06:44.680 --> 00:06:46.880] - And then also look at all of the previous moves.
[00:06:46.880 --> 00:06:50.080] So Gary Kasparov I believe could think five, six moves ahead
[00:06:50.080 --> 00:06:52.640] but then Deep Blue could think seven, eight moves ahead
[00:06:52.640 --> 00:06:54.840] and it was just giant supercomputer literally,
[00:06:54.840 --> 00:06:56.600] you know, that kind of beat him.
[00:06:56.600 --> 00:06:58.960] And I was like, okay, that's the case.
[00:06:58.960 --> 00:07:02.280] Then humans started playing with computers
[00:07:02.280 --> 00:07:03.920] and computers started playing with humans
[00:07:03.920 --> 00:07:07.000] and now the best chess players are humans working
[00:07:07.000 --> 00:07:08.840] with computers which is very interesting.
[00:07:08.840 --> 00:07:11.600] But it was again this very defined space.
[00:07:11.600 --> 00:07:16.600] By contrast Go was a game that people thought
[00:07:16.600 --> 00:07:18.520] couldn't be beaten by this mechanism
[00:07:18.520 --> 00:07:20.640] because a Go board Chinese chess
[00:07:20.640 --> 00:07:24.320] has too many computational possibilities.
[00:07:24.320 --> 00:07:26.520] So you can't think X moves ahead
[00:07:26.520 --> 00:07:29.360] because you just get exponentially more compute required.
[00:07:29.360 --> 00:07:33.600] So DeepMind, a research lab out of London
[00:07:33.600 --> 00:07:37.240] now owned by Google built a system called AlphaGo
[00:07:37.240 --> 00:07:40.600] that created a computer that again, learned principles
[00:07:40.600 --> 00:07:42.520] and actually played against itself.
[00:07:42.520 --> 00:07:44.200] So they didn't even look at historical games
[00:07:44.200 --> 00:07:45.600] for the later versions.
[00:07:45.600 --> 00:07:48.440] They pitted against Lisa Dahl who was,
[00:07:48.440 --> 00:07:49.960] I think it was seventh on ninth Dan.
[00:07:49.960 --> 00:07:52.840] He was basically like the Magnus Carlsen of Go.
[00:07:52.840 --> 00:07:54.640] So far ahead of everyone and everyone's like,
[00:07:54.640 --> 00:07:56.360] it has no way they can beat him.
[00:07:56.360 --> 00:07:59.000] It drew with him once and beat him like another seven times
[00:07:59.000 --> 00:08:01.120] and I was like, wait, what?
[00:08:01.120 --> 00:08:03.120] Like I said, without doing that massive levels
[00:08:03.120 --> 00:08:04.440] of number crunching 'cause it learned
[00:08:04.440 --> 00:08:06.160] what was important in moves
[00:08:06.160 --> 00:08:07.920] and the principles to do moves.
[00:08:07.920 --> 00:08:10.720] - Yeah, I was 2016 when this happened.
[00:08:10.720 --> 00:08:13.240] - Exactly, that was kind of with the reinforcement
[00:08:13.240 --> 00:08:15.920] self-supervised learning which was one component of this
[00:08:15.920 --> 00:08:17.720] before the deep learning came which were
[00:08:17.720 --> 00:08:20.720] and transformer based attention learning which was 2017
[00:08:20.720 --> 00:08:22.400] which is the next step above that.
[00:08:22.400 --> 00:08:25.000] So there's a few things all happening at the same time
[00:08:25.000 --> 00:08:28.040] along with an exponential again, a mathematician.
[00:08:28.040 --> 00:08:29.640] So these exponentials likely literally,
[00:08:29.640 --> 00:08:31.440] a lot of these things look like exponentials
[00:08:31.440 --> 00:08:32.880] 'cause they are exponentials
[00:08:32.880 --> 00:08:35.320] increasing compute availability.
[00:08:35.320 --> 00:08:39.200] So what happened there is that then it was very interesting.
[00:08:39.200 --> 00:08:42.000] The, since that point where everyone's like,
[00:08:42.000 --> 00:08:44.880] holy crap, he's got beat and his level has gone up and go
[00:08:44.880 --> 00:08:46.040] but then so's everyone else.
[00:08:46.040 --> 00:08:48.480] So if you look at the average level of Go players,
[00:08:48.480 --> 00:08:50.200] it's been like this for about three decades
[00:08:50.200 --> 00:08:53.040] and now it does that because the computer could think
[00:08:53.040 --> 00:08:55.040] in brand new ways and think about new principal
[00:08:55.040 --> 00:08:56.920] both ways to do it but now humans computers
[00:08:56.920 --> 00:08:58.160] got even better.
[00:08:58.160 --> 00:09:01.000] Then there was the transformer based architecture paper.
[00:09:01.000 --> 00:09:01.920] I'm skipping over a lot.
[00:09:01.920 --> 00:09:04.640] There's a lot of stuff happening in deep learning
[00:09:04.640 --> 00:09:07.040] and it was this attention based system whereby it paid
[00:09:07.040 --> 00:09:10.280] attention to the most important parts of a given data set
[00:09:10.280 --> 00:09:14.360] that led to breakthroughs like GPT-3 in 2020.
[00:09:14.360 --> 00:09:18.600] GPT-3 is a model by OpenAI which is a research lab
[00:09:18.600 --> 00:09:21.200] primarily backed by Microsoft focused on
[00:09:21.200 --> 00:09:22.520] artificial general intelligence.
[00:09:22.520 --> 00:09:25.520] So how do you make an AI that can do just about anything?
[00:09:25.520 --> 00:09:28.920] That could write like a human.
[00:09:28.920 --> 00:09:31.000] So you give it like a lassen gimley
[00:09:31.000 --> 00:09:33.040] and it'll write your whole story in the style
[00:09:33.040 --> 00:09:34.600] of one of the rings.
[00:09:34.600 --> 00:09:36.400] But what it does is basically it guessed
[00:09:36.400 --> 00:09:38.560] what the next word in a sentence is
[00:09:38.560 --> 00:09:40.920] from a giant corpus of text,
[00:09:40.920 --> 00:09:42.840] actually not big, a few terabytes,
[00:09:42.840 --> 00:09:44.200] a few thousand gigabytes,
[00:09:44.200 --> 00:09:47.360] that was then run on a gigantic supercomputer.
[00:09:47.360 --> 00:09:50.560] So supercomputers kind of had a linear increase
[00:09:50.560 --> 00:09:53.080] in their capabilities over the years.
[00:09:53.080 --> 00:09:55.360] And you see things like the Apollo landing
[00:09:55.360 --> 00:09:59.240] is like same compute as your iPhone, right?
[00:09:59.240 --> 00:10:00.840] But that was still quite linear.
[00:10:00.840 --> 00:10:03.880] Over the last few years led by kind of Nvidia
[00:10:03.880 --> 00:10:05.480] and these GPU moments,
[00:10:05.480 --> 00:10:08.400] you've had an exponential increase in supercomputer.
[00:10:08.400 --> 00:10:10.560] And these models led themselves to,
[00:10:10.560 --> 00:10:13.560] you take a relatively small amount of data
[00:10:13.560 --> 00:10:17.200] and like text writings of the whole of archive
[00:10:17.200 --> 00:10:20.480] or PubMed or a scrape of the internet
[00:10:20.480 --> 00:10:25.240] or like a billion images with captions.
[00:10:25.240 --> 00:10:26.920] And then you put it into the supercomputer
[00:10:26.920 --> 00:10:28.920] and the supercomputer looks at the connections
[00:10:28.920 --> 00:10:31.880] between the words and the images or the words in a sentence
[00:10:31.880 --> 00:10:33.000] and how they line up
[00:10:33.000 --> 00:10:35.360] to figure out what should come next.
[00:10:35.360 --> 00:10:37.000] So this was the big breakthrough in that
[00:10:37.000 --> 00:10:39.680] you didn't actually have to build a custom algorithm
[00:10:39.680 --> 00:10:41.440] for everything anymore.
[00:10:41.440 --> 00:10:43.400] There was one set of algorithms
[00:10:43.400 --> 00:10:46.960] like you had to do a good level of customization,
[00:10:46.960 --> 00:10:49.200] but the key edge and key differential
[00:10:49.200 --> 00:10:51.160] was no longer how big is your data set,
[00:10:51.160 --> 00:10:54.280] you know, or seeing how customers use the data set,
[00:10:54.280 --> 00:10:56.400] it was just how much compute do you have.
[00:10:56.400 --> 00:10:58.520] So more and more compute was applied to these models
[00:10:58.520 --> 00:10:59.480] and then they just broke through
[00:10:59.480 --> 00:11:01.520] and they got bigger and bigger and bigger.
[00:11:01.520 --> 00:11:06.520] So like GPT-3 was 167 billion parameter model.
[00:11:06.520 --> 00:11:10.600] That's the kind of, you say it's the kind of things
[00:11:10.600 --> 00:11:11.520] that it knows.
[00:11:11.520 --> 00:11:15.840] And then it got to 500 billion and then bigger and bigger.
[00:11:15.840 --> 00:11:18.320] These large language models as they were called
[00:11:18.320 --> 00:11:22.600] that could be human level in answering questions, you know?
[00:11:22.600 --> 00:11:24.760] And this technology started to proliferate
[00:11:24.760 --> 00:11:26.720] because when you get to human level it was great,
[00:11:26.720 --> 00:11:30.880] but it didn't proliferate that fast because it was slow
[00:11:30.880 --> 00:11:32.200] and it was expensive.
[00:11:32.200 --> 00:11:33.920] And it required a lot of technical expertise
[00:11:33.920 --> 00:11:36.600] to even run these models, let alone create them.
[00:11:36.600 --> 00:11:39.360] And again, like the super compute levels
[00:11:39.360 --> 00:11:43.600] are just beyond belief and try to get to that.
[00:11:43.600 --> 00:11:46.200] At the start of last year on the image side,
[00:11:46.200 --> 00:11:49.360] which is one of the areas that we're focused,
[00:11:49.360 --> 00:11:52.440] the Open Hour released something interesting called clip,
[00:11:52.440 --> 00:11:54.600] which was an image to text model.
[00:11:54.600 --> 00:11:58.040] So you could also generate text descriptions of images.
[00:11:58.040 --> 00:12:00.840] There were some generative models before that.
[00:12:00.840 --> 00:12:02.320] And so you had a generative model
[00:12:02.320 --> 00:12:04.040] and then you had a model that could tell you
[00:12:04.040 --> 00:12:05.560] what a generation was.
[00:12:05.560 --> 00:12:08.880] And so a bunch of groups came together and said,
[00:12:08.880 --> 00:12:10.640] "What if you bounce them off each other?
[00:12:10.640 --> 00:12:12.000] What that tells you what an image is
[00:12:12.000 --> 00:12:14.800] and what it tells you how to generate an image?
[00:12:14.800 --> 00:12:16.600] You could converge to better images."
[00:12:16.600 --> 00:12:18.640] And that's what kicked off this whole image revolution.
[00:12:18.640 --> 00:12:21.280] - Your conversion is from language.
[00:12:21.280 --> 00:12:24.320] So just text based or speech based or whatever,
[00:12:24.320 --> 00:12:25.720] into images.
[00:12:25.720 --> 00:12:28.880] You're bringing two different modalities together.
[00:12:28.880 --> 00:12:30.640] - Yeah, so the two models bounced off each other.
[00:12:30.640 --> 00:12:33.320] So it'd be like a dog in a stained glass window
[00:12:33.320 --> 00:12:34.440] and it produced a version.
[00:12:34.440 --> 00:12:37.120] And then the image to text model would be like,
[00:12:37.120 --> 00:12:38.640] "Ah, that's not that good."
[00:12:38.640 --> 00:12:41.840] It looks like that, but then there was the other prompt
[00:12:41.840 --> 00:12:43.240] and it made adjustments.
[00:12:43.240 --> 00:12:45.520] Then it went back and forth, back and forth, back and forth.
[00:12:45.520 --> 00:12:47.160] So you got something that looked a little bit
[00:12:47.160 --> 00:12:49.600] like a dog in a stained glass window.
[00:12:49.600 --> 00:12:52.400] And then teams around the world,
[00:12:52.400 --> 00:12:54.400] led by a lot of people at Stability
[00:12:54.400 --> 00:12:57.800] and OpenAI and Meta and some other places,
[00:12:57.800 --> 00:13:00.400] started to think, "How can we really crack through this?"
[00:13:00.400 --> 00:13:02.840] And it just went a bit crazy to get to the point now
[00:13:02.840 --> 00:13:04.880] that you can generate a photorealistic image
[00:13:04.880 --> 00:13:06.440] of anything in about a second.
[00:13:06.440 --> 00:13:09.520] And again, this is part of the exponentials.
[00:13:09.520 --> 00:13:11.760] Like the amount of compute that we're using now
[00:13:11.760 --> 00:13:14.360] as a private company that's 14 months old,
[00:13:14.360 --> 00:13:17.160] is 10 times the compute of NASA, but together.
[00:13:17.160 --> 00:13:20.760] Or 10 times the compute of the fastest supercomputer in the UK.
[00:13:20.760 --> 00:13:22.200] It would have been the fastest supercomputer
[00:13:22.200 --> 00:13:24.480] in the world just 60 years ago.
[00:13:24.480 --> 00:13:26.560] So for a private company to be able to access that
[00:13:26.560 --> 00:13:27.640] is a bit insane.
[00:13:27.640 --> 00:13:30.520] And how did you, you have a little bit of an unusual past
[00:13:30.520 --> 00:13:34.000] into this or path into this?
[00:13:34.000 --> 00:13:37.480] So can you talk through like how you actually ended up
[00:13:37.480 --> 00:13:39.480] at the forefront of a lot of these things?
[00:13:39.480 --> 00:13:42.000] - Oh, so yeah, I was quite lucky through life.
[00:13:42.000 --> 00:13:44.920] I was a hedge fund manager, I was a video game investor.
[00:13:44.920 --> 00:13:47.720] I took a break when my son was diagnosed with autism.
[00:13:47.720 --> 00:13:49.800] And I realized that AI could be used
[00:13:49.800 --> 00:13:51.520] to try and solve some of these things.
[00:13:51.520 --> 00:13:53.480] This is the old school AI,
[00:13:53.480 --> 00:13:55.960] because for people who know about autism,
[00:13:55.960 --> 00:13:57.360] what's in spectrum disorder,
[00:13:57.360 --> 00:13:59.520] there's no official treatment or cure,
[00:13:59.520 --> 00:14:01.800] or nobody knows actually what causes it.
[00:14:01.800 --> 00:14:04.440] And so I was like, what if we did an electroanalysis
[00:14:04.440 --> 00:14:06.520] of all the different things that people think
[00:14:06.520 --> 00:14:07.800] that cause it and try and figure out
[00:14:07.800 --> 00:14:10.560] some commonalities with AI?
[00:14:10.560 --> 00:14:13.120] And then identified some things in the brain,
[00:14:13.120 --> 00:14:14.360] GABA and glutamate balance.
[00:14:14.360 --> 00:14:15.560] So GABA calms you down,
[00:14:15.560 --> 00:14:16.960] and glutamate makes you excited.
[00:14:16.960 --> 00:14:20.280] So when you need to take Valium, your GABA goes up.
[00:14:20.280 --> 00:14:21.400] When your brain is too excited,
[00:14:21.400 --> 00:14:23.080] it's like when you're tapping your leg
[00:14:23.080 --> 00:14:26.120] and you can't focus and pay attention to things, right?
[00:14:26.120 --> 00:14:29.880] And so kids with ASD are often like that,
[00:14:29.880 --> 00:14:32.000] in that they can't pay attention to form links
[00:14:32.000 --> 00:14:34.720] between words and images and concepts,
[00:14:34.720 --> 00:14:37.680] actually very similar to these diffusion-based image models.
[00:14:37.680 --> 00:14:39.480] So a cup can mean cup your hands,
[00:14:39.480 --> 00:14:41.600] a cup that you've got like that, a World Cup,
[00:14:41.600 --> 00:14:42.960] you know, maybe Argentina or France,
[00:14:42.960 --> 00:14:45.720] or when it who knows, recording just before that.
[00:14:45.720 --> 00:14:47.120] So you need to calm down the brain somewhere.
[00:14:47.120 --> 00:14:47.960] But when I looked at it,
[00:14:47.960 --> 00:14:50.640] like there were 18 different things that led to that,
[00:14:50.640 --> 00:14:52.400] potentially, and certain treatments
[00:14:52.400 --> 00:14:54.160] that make some kids worse, some kids better.
[00:14:54.160 --> 00:14:55.520] So we did a lot of drug repurposing
[00:14:55.520 --> 00:14:56.640] on the end of one,
[00:14:56.640 --> 00:14:58.200] and then I was advising governments
[00:14:58.200 --> 00:14:59.720] and things at the time about AI
[00:14:59.720 --> 00:15:02.200] and all sorts of other topics.
[00:15:02.200 --> 00:15:03.760] I was like, this is really powerful technology,
[00:15:03.760 --> 00:15:05.040] but, you know, I'm not a doctor,
[00:15:05.040 --> 00:15:08.480] so I did my best to tell other people about it, but it's okay.
[00:15:08.480 --> 00:15:09.520] But then about a few years ago,
[00:15:09.520 --> 00:15:11.000] I realized that actually this technology
[00:15:11.000 --> 00:15:12.480] could change the world.
[00:15:12.480 --> 00:15:15.440] So we used it first in education in the small set
[00:15:15.440 --> 00:15:16.960] and refugee camps around the world,
[00:15:16.960 --> 00:15:18.440] or the charity my co-founder
[00:15:18.440 --> 00:15:19.720] around to imagine worldwide.
[00:15:19.720 --> 00:15:21.920] And that's going massive,
[00:15:21.920 --> 00:15:23.560] and there'll be announcements next year.
[00:15:23.560 --> 00:15:26.800] And then working on the United Nations AI response
[00:15:26.800 --> 00:15:28.080] on COVID-19 as well,
[00:15:28.080 --> 00:15:28.920] because again, it's this thing
[00:15:28.920 --> 00:15:30.240] where it's multi-systemic condition,
[00:15:30.240 --> 00:15:31.560] no one knew it was causing it,
[00:15:31.560 --> 00:15:33.720] and that knowledge needed to be organized.
[00:15:33.720 --> 00:15:35.600] Had loads of bureaucracy through that,
[00:15:35.600 --> 00:15:38.240] lots of companies promising stuff that wouldn't deliver,
[00:15:38.240 --> 00:15:40.800] really got into this sector and realized,
[00:15:40.800 --> 00:15:42.800] this AI is possibly the most powerful thing
[00:15:42.800 --> 00:15:43.640] we've ever seen,
[00:15:43.640 --> 00:15:45.960] because human level means a lot, right?
[00:15:45.960 --> 00:15:47.840] And the only people that could build that
[00:15:47.840 --> 00:15:50.560] are, was basically the big tech companies
[00:15:50.560 --> 00:15:52.600] plus OpenAI and a couple of others.
[00:15:53.600 --> 00:15:55.680] And none of them wanted to release it open,
[00:15:55.680 --> 00:15:58.320] because it is powerful.
[00:15:58.320 --> 00:16:00.080] And powerful means also dangerous, right?
[00:16:00.080 --> 00:16:01.400] There's always an upside-end downside.
[00:16:01.400 --> 00:16:03.280] I don't believe technology is neutral.
[00:16:03.280 --> 00:16:05.080] But the way they were doing it,
[00:16:05.080 --> 00:16:06.240] it would never be released.
[00:16:06.240 --> 00:16:08.320] So it would only be available to a select few,
[00:16:08.320 --> 00:16:11.080] and the select few could create any image in seconds,
[00:16:11.080 --> 00:16:12.840] or write an entire story.
[00:16:12.840 --> 00:16:14.200] Where would it ever go to India,
[00:16:14.200 --> 00:16:16.120] or Africa, or places like that?
[00:16:16.120 --> 00:16:17.280] So that's why I thought,
[00:16:17.280 --> 00:16:20.080] this is infrastructure just as important as the internet,
[00:16:20.080 --> 00:16:22.600] for the next step in kind of human ability,
[00:16:22.600 --> 00:16:23.600] and it should be open source.
[00:16:23.600 --> 00:16:26.280] And then also, it's a better business model as well,
[00:16:26.280 --> 00:16:28.320] putting on my hedge fund manager app.
[00:16:28.320 --> 00:16:30.320] 'Cause all of our vital infrastructure
[00:16:30.320 --> 00:16:32.920] for the internet service, databases, DevOps,
[00:16:32.920 --> 00:16:34.840] it's all turned open source now.
[00:16:34.840 --> 00:16:36.760] And the simple business model is scale and service.
[00:16:36.760 --> 00:16:38.600] People come to you when they wanna scale it,
[00:16:38.600 --> 00:16:39.800] well, customer versions of it.
[00:16:39.800 --> 00:16:41.880] And I thought that's the winning business strategy.
[00:16:41.880 --> 00:16:43.160] So that's how I start stability
[00:16:43.160 --> 00:16:44.840] as a mission-based organization,
[00:16:44.840 --> 00:16:47.040] with a profit-based focus.
[00:16:47.040 --> 00:16:48.680] But the profit is making these models
[00:16:48.680 --> 00:16:50.120] available to everyone,
[00:16:50.120 --> 00:16:54.440] and customizing them and scaling them for everyone as well.
[00:16:54.440 --> 00:16:55.320] It has been interesting,
[00:16:55.320 --> 00:16:58.240] as not Silicon Valley native, or anything like that,
[00:16:58.240 --> 00:16:59.440] and talking to people about
[00:16:59.440 --> 00:17:02.240] why we should let people have access to this technology,
[00:17:02.240 --> 00:17:04.320] and people are generally good, not bad.
[00:17:04.320 --> 00:17:07.960] But yeah, it's been quite a ride.
[00:17:07.960 --> 00:17:09.280] - I wanna play that back to you a little bit,
[00:17:09.280 --> 00:17:12.280] because it's how you actually got into it,
[00:17:12.280 --> 00:17:14.760] is such an interesting part of this.
[00:17:14.760 --> 00:17:18.160] So you were working at a hedge fund,
[00:17:18.160 --> 00:17:21.760] and you took a break because of your son's autism disorder,
[00:17:21.760 --> 00:17:24.400] and you were able to do a bunch of different stuff
[00:17:24.400 --> 00:17:26.360] with traditional AI to kind of figure out
[00:17:26.360 --> 00:17:28.800] how to make sense of the different drugs,
[00:17:28.800 --> 00:17:30.640] and the causes, and all that stuff.
[00:17:30.640 --> 00:17:33.480] And then, did you see some people,
[00:17:33.480 --> 00:17:37.320] was it open AI, and what they were working on,
[00:17:37.320 --> 00:17:38.240] and you saw that,
[00:17:38.240 --> 00:17:40.080] and how did the actual stable diffusion,
[00:17:40.080 --> 00:17:42.920] and the involvement, and all of that stuff come to be?
[00:17:42.920 --> 00:17:44.840] Because it sounds like you were around the industry,
[00:17:44.840 --> 00:17:47.000] but did you meet someone and say,
[00:17:47.000 --> 00:17:49.200] "Were someone already working on stable diffusion,
[00:17:49.200 --> 00:17:50.640] the open source project?"
[00:17:50.640 --> 00:17:53.840] - So I got involved about two and a half years ago,
[00:17:53.840 --> 00:17:56.080] in Luther AI, as part of the community,
[00:17:56.080 --> 00:17:56.920] where we were like,
[00:17:56.920 --> 00:17:58.960] let's build an open source version of GPT-3,
[00:17:58.960 --> 00:18:01.480] 'cause the open AI stopped releasing stuff
[00:18:01.480 --> 00:18:04.280] after their investment from Microsoft in 2019,
[00:18:04.280 --> 00:18:05.920] 'cause that it was too dangerous,
[00:18:05.920 --> 00:18:07.080] which is ironic,
[00:18:07.080 --> 00:18:09.800] given the original founding statement of open AI,
[00:18:09.800 --> 00:18:12.360] but again, it's their prerogative, right?
[00:18:12.360 --> 00:18:13.840] 'Cause again, technology that's powerful,
[00:18:13.840 --> 00:18:17.640] can be considered dangerous, and you had to talk with your dad.
[00:18:17.640 --> 00:18:18.680] - So it was language models first,
[00:18:18.680 --> 00:18:20.800] but then January of last year,
[00:18:20.800 --> 00:18:21.640] when Cliff came out,
[00:18:21.640 --> 00:18:22.880] I actually built a system,
[00:18:22.880 --> 00:18:24.680] that's when I was actually getting over COVID,
[00:18:24.680 --> 00:18:27.360] I built a system for my daughter to generate art,
[00:18:27.360 --> 00:18:29.520] based on that, and it was amazing.
[00:18:29.520 --> 00:18:31.800] So she created a vision board of what she wants to make,
[00:18:31.800 --> 00:18:32.760] and then she made a description,
[00:18:32.760 --> 00:18:34.840] and she created 16 images very slowly,
[00:18:34.840 --> 00:18:37.080] which were like a bit smooshy and stylistic,
[00:18:37.080 --> 00:18:39.760] and then she told how each one of them was different,
[00:18:39.760 --> 00:18:42.120] and the system interpreted that to generate 16 more,
[00:18:42.120 --> 00:18:43.560] 16 more, 16 more.
[00:18:43.560 --> 00:18:45.880] Eight hours later, she generated an image,
[00:18:45.880 --> 00:18:48.760] that she then sold as an NFT for India code relief.
[00:18:48.760 --> 00:18:50.400] She raised $3,500.
[00:18:50.400 --> 00:18:52.960] This is amazing,
[00:18:52.960 --> 00:18:55.160] especially because I have aphantasia,
[00:18:55.160 --> 00:18:57.880] so I can't view anything in my head.
[00:18:57.880 --> 00:18:59.520] It's a condition where you can't visualize anything,
[00:18:59.520 --> 00:19:01.520] and I was like, something I can visualize stuff.
[00:19:01.520 --> 00:19:03.520] Wouldn't it be great if anyone could visualize stuff,
[00:19:03.520 --> 00:19:06.240] 'cause the way that I've thought about things is that,
[00:19:06.240 --> 00:19:09.000] you and I doing what we're doing right now, talking,
[00:19:09.000 --> 00:19:10.880] is the easiest thing in the world for humans to do,
[00:19:10.880 --> 00:19:11.720] relatively speaking.
[00:19:11.720 --> 00:19:12.960] Sometimes you need a drink, right?
[00:19:12.960 --> 00:19:15.280] But it's still relatively easy.
[00:19:15.280 --> 00:19:17.600] Written is harder, that's why we pay people to be writers,
[00:19:17.600 --> 00:19:20.560] and image is the hardest, creating art or PowerPoint,
[00:19:20.560 --> 00:19:22.280] it's just really difficult and painful.
[00:19:22.280 --> 00:19:25.160] But this technology can make it easy, so let's fund that.
[00:19:25.160 --> 00:19:27.000] So last year I funded the whole space,
[00:19:27.000 --> 00:19:30.200] and all the notebooks and models and developers,
[00:19:30.200 --> 00:19:33.480] like I hired them, I funded them, gave them benefits,
[00:19:33.480 --> 00:19:36.920] whatever they wanted, started building the compute resources,
[00:19:36.920 --> 00:19:38.960] and there were a whole bunch of different models.
[00:19:38.960 --> 00:19:42.520] The stable diffusion model came about from latent diffusion,
[00:19:42.520 --> 00:19:44.560] which came out of CompViz.
[00:19:44.560 --> 00:19:47.880] So that was a paper led written by Robin Rombak,
[00:19:47.880 --> 00:19:50.040] who's our lead generator for AI researcher,
[00:19:50.040 --> 00:19:52.720] and Andreas Blatman, who's joining us shortly.
[00:19:52.720 --> 00:19:56.160] That was kind of a bit of a breakthrough in high speed,
[00:19:56.160 --> 00:19:58.400] because they didn't have access to many GPUs,
[00:19:58.400 --> 00:20:01.120] so they really optimized for high speed diffusion.
[00:20:01.120 --> 00:20:04.560] Most of the advances in the sector, I think,
[00:20:04.560 --> 00:20:06.480] can be credited probably to Catherine Krausen,
[00:20:06.480 --> 00:20:08.440] Rivers Have Wings, is her Twitter handle,
[00:20:08.440 --> 00:20:10.960] who's our other lead generator of AI researcher.
[00:20:10.960 --> 00:20:12.960] And again, she was just in the community
[00:20:12.960 --> 00:20:14.840] and just was delighted to support her
[00:20:14.840 --> 00:20:16.080] in kind of building these models,
[00:20:16.080 --> 00:20:19.240] as well as other teams like the R.U. Dali team and others.
[00:20:19.240 --> 00:20:22.360] But then in about February of this year,
[00:20:22.360 --> 00:20:24.000] kind of Robin messaged me, and he's like,
[00:20:24.000 --> 00:20:25.000] "We need to scale this up.
[00:20:25.000 --> 00:20:26.720] I think it could be a breakthrough."
[00:20:26.720 --> 00:20:27.560] I agreed to it.
[00:20:27.560 --> 00:20:29.640] And then the original stable diffusion released in August
[00:20:29.640 --> 00:20:32.760] was under LMU CompViz.
[00:20:32.760 --> 00:20:34.800] So CompViz is the lab led by Bjorn Oman,
[00:20:34.800 --> 00:20:37.440] and Robin, and then Patrick,
[00:20:37.440 --> 00:20:38.720] who was at RunwayML,
[00:20:38.720 --> 00:20:40.880] who's at RunwayML as their lead generator of AI researcher,
[00:20:40.880 --> 00:20:42.200] with the two leads on that.
[00:20:42.200 --> 00:20:45.720] So Robin has his stability and then Patrick there,
[00:20:45.720 --> 00:20:47.680] creating it because the approach that I've always taken
[00:20:47.680 --> 00:20:49.320] at stability, because we support communities
[00:20:49.320 --> 00:20:51.600] doing all the models, is a collaborative one,
[00:20:51.600 --> 00:20:55.680] whereby R core team, infra team,
[00:20:55.680 --> 00:20:58.200] academic, independence, and others all coming together
[00:20:58.200 --> 00:20:59.840] can build much better technology.
[00:20:59.840 --> 00:21:03.280] And that's why I'm a stable diffusion.
[00:21:03.280 --> 00:21:04.360] A whole bunch of people got together,
[00:21:04.360 --> 00:21:06.680] but it was really Robin and Patrick leading it.
[00:21:07.520 --> 00:21:10.440] And they pushed the boundaries and achieved amazing things.
[00:21:10.440 --> 00:21:12.240] They took 100,000 gigabytes of images
[00:21:12.240 --> 00:21:14.800] and compressed it down to a 1.6 gigabyte file
[00:21:14.800 --> 00:21:16.320] that could create just about anything.
[00:21:16.320 --> 00:21:18.760] And that was insane.
[00:21:18.760 --> 00:21:21.000] And that was released August 23rd.
[00:21:21.000 --> 00:21:23.040] And yeah, since then.
[00:21:23.040 --> 00:21:24.560] So what's been the growth?
[00:21:24.560 --> 00:21:26.120] So the company was established,
[00:21:26.120 --> 00:21:28.440] so just so folks are following,
[00:21:28.440 --> 00:21:30.640] stable diffusion is the open source project
[00:21:30.640 --> 00:21:32.600] of which you help fund,
[00:21:32.600 --> 00:21:36.520] and part of your team also started.
[00:21:36.520 --> 00:21:40.160] And then the company around it is Stability AI.
[00:21:40.160 --> 00:21:44.000] So when was Stability actually incorporated?
[00:21:44.000 --> 00:21:45.280] It was probably about two years ago
[00:21:45.280 --> 00:21:48.040] when we were leading one of the UN AI initiatives.
[00:21:48.040 --> 00:21:51.080] So we designed and architected that.
[00:21:51.080 --> 00:21:53.360] And then it kicked off probably about 14 months ago,
[00:21:53.360 --> 00:21:55.400] saying let's do all the types of AI.
[00:21:55.400 --> 00:21:57.600] So right now we do all the types of AI
[00:21:57.600 --> 00:22:00.240] from language models to protein to image and others.
[00:22:00.240 --> 00:22:02.480] But stable diffusion is the most popular open source software
[00:22:02.480 --> 00:22:03.760] in the world ever.
[00:22:03.760 --> 00:22:06.280] So since launching August 23rd,
[00:22:06.280 --> 00:22:09.920] it's received 46,000 GitHub stars
[00:22:09.920 --> 00:22:12.840] between version one, which was this collaborative thing,
[00:22:12.840 --> 00:22:15.240] and version two, which was our highly optimized version
[00:22:15.240 --> 00:22:17.240] that we ourselves released.
[00:22:17.240 --> 00:22:19.360] Plus a bunch of tools around that.
[00:22:19.360 --> 00:22:20.240] To give you an example,
[00:22:20.240 --> 00:22:22.520] it's overtaken Bitcoin and Ethereum,
[00:22:22.520 --> 00:22:24.880] which took about 10 years to get to that level
[00:22:24.880 --> 00:22:26.400] of developer interest.
[00:22:26.400 --> 00:22:28.880] And when you add up all the stars of the ecosystem,
[00:22:28.880 --> 00:22:30.880] it's now the most popular open source software
[00:22:30.880 --> 00:22:33.040] in the world ever, just in three months.
[00:22:34.040 --> 00:22:36.320] So the other models are amazing,
[00:22:36.320 --> 00:22:38.040] like the language models from Alutha,
[00:22:38.040 --> 00:22:39.640] which is one of the communities that we support,
[00:22:39.640 --> 00:22:41.680] and we hope to spin off into foundation soon.
[00:22:41.680 --> 00:22:43.600] They've been downloaded 25 million times
[00:22:43.600 --> 00:22:46.920] in the most popular language models in the world.
[00:22:46.920 --> 00:22:49.480] But this thing is just the most disruptive thing ever,
[00:22:49.480 --> 00:22:51.880] and next year's it's gonna get even more disruptive.
[00:22:51.880 --> 00:22:53.600] It's what powers things like Lenzer,
[00:22:53.600 --> 00:22:54.960] which is the number one app on the app store.
[00:22:54.960 --> 00:22:57.720] I think they're making $5 million a day.
[00:22:57.720 --> 00:22:58.560] It's quite nice.
[00:22:58.560 --> 00:23:01.160] And a whole bunch of other things.
[00:23:01.160 --> 00:23:02.960] - Maybe, tell people what Lenzer is.
[00:23:02.960 --> 00:23:04.960] I played around with it, but...
[00:23:04.960 --> 00:23:06.760] - Yeah, so Lenzer or Dawn AI,
[00:23:06.760 --> 00:23:08.440] you upload 10 pictures of your face,
[00:23:08.440 --> 00:23:10.920] and then it puts you in all sorts of different,
[00:23:10.920 --> 00:23:14.160] it's like artistic variants and things like that.
[00:23:14.160 --> 00:23:18.000] - We'll upload here my version of it for people to see,
[00:23:18.000 --> 00:23:21.920] but it's super cool to see the power of these things.
[00:23:21.920 --> 00:23:23.480] - But these things are getting, again,
[00:23:23.480 --> 00:23:25.200] exponentially more powerful.
[00:23:25.200 --> 00:23:28.640] So when we released stable diffusion August the 23rd,
[00:23:28.640 --> 00:23:31.120] it was 5.6 seconds for an image
[00:23:31.120 --> 00:23:33.120] on the highest end graphics card.
[00:23:33.120 --> 00:23:36.520] Now it's 0.9 seconds for an image.
[00:23:36.520 --> 00:23:39.600] In January, it'll be 30 images a second.
[00:23:39.600 --> 00:23:41.080] There's a hundred times speed increase
[00:23:41.080 --> 00:23:42.720] that we've managed to achieve
[00:23:42.720 --> 00:23:45.160] working with various teams around the world,
[00:23:45.160 --> 00:23:47.320] which is insane for this tiny one gigabyte file.
[00:23:47.320 --> 00:23:49.200] So what you just saw with Lenzer,
[00:23:49.200 --> 00:23:52.320] imagine if you could do that whole process 20 times faster.
[00:23:52.320 --> 00:23:53.320] - I mean, it's super cool.
[00:23:53.320 --> 00:23:55.880] Hopefully people admire the picture of me
[00:23:55.880 --> 00:23:58.640] that we showed on screen for YouTube people.
[00:23:58.640 --> 00:24:02.880] But what are the use cases today
[00:24:02.880 --> 00:24:07.480] that have people so excited in a practical sense?
[00:24:07.480 --> 00:24:09.600] Like obviously it's cool to be able to do this
[00:24:09.600 --> 00:24:11.040] in real time of myself, but--
[00:24:11.040 --> 00:24:13.240] - Look, it disrupts the entire creative industry
[00:24:13.240 --> 00:24:16.480] and a year or two will be generating whole movies real time.
[00:24:16.480 --> 00:24:17.760] - And what does that actually mean?
[00:24:17.760 --> 00:24:19.960] - It means you describe that I want to generate a movie
[00:24:19.960 --> 00:24:21.240] about, I don't know,
[00:24:21.240 --> 00:24:25.560] Emma and Logan having a coffee at Starbucks or whatever.
[00:24:25.560 --> 00:24:27.520] You input a few assets of our faces
[00:24:27.520 --> 00:24:29.680] and then a short while later you have a movie
[00:24:29.680 --> 00:24:31.160] about them having a chat
[00:24:31.160 --> 00:24:33.000] and the chat is instantly generated as well
[00:24:33.000 --> 00:24:34.600] about any topic that you want.
[00:24:34.600 --> 00:24:36.720] If you get a practical example right now,
[00:24:36.720 --> 00:24:38.680] there's a film that is shooting,
[00:24:38.680 --> 00:24:41.480] it can't reveal details with some very famous people.
[00:24:41.480 --> 00:24:44.480] They had to do a photo binder
[00:24:44.480 --> 00:24:47.920] with like 30 different actresses inside it
[00:24:47.920 --> 00:24:50.800] and those actresses were victims of a serial killer.
[00:24:50.800 --> 00:24:53.160] It would have cost half a million dollars
[00:24:53.160 --> 00:24:54.680] when you look at SAG daily rates,
[00:24:54.680 --> 00:24:56.920] makeup, shooting, everything like that.
[00:24:56.920 --> 00:24:58.880] Did it in two hours using this technology,
[00:24:58.880 --> 00:25:01.760] save the production half a million dollars.
[00:25:01.760 --> 00:25:06.120] We're seeing companies basically bring videos to market,
[00:25:06.120 --> 00:25:10.480] 75% quicker, so 20, well, three, four times quicker.
[00:25:10.480 --> 00:25:12.440] Video games will be generated,
[00:25:12.440 --> 00:25:14.240] the assets for that even quicker as well,
[00:25:14.240 --> 00:25:16.360] it's about 25% of video game budgets.
[00:25:16.360 --> 00:25:17.920] So the people that are using this technology
[00:25:17.920 --> 00:25:20.800] are just massively slashing creation costs.
[00:25:20.800 --> 00:25:23.600] So there's a real enterprise solution version of that.
[00:25:23.600 --> 00:25:25.640] For the average individual listening to this,
[00:25:25.640 --> 00:25:27.120] this is the technology that will mean
[00:25:27.120 --> 00:25:29.200] that you'll never have to build a PowerPoint slide again
[00:25:29.200 --> 00:25:31.880] in a couple of years because you just describe it
[00:25:31.880 --> 00:25:34.440] and then you'll say make it happier or sad or whatever
[00:25:34.440 --> 00:25:36.800] when you combine this with a language model and code model.
[00:25:36.800 --> 00:25:39.480] And you'll never have to see any of that abstraction.
[00:25:39.480 --> 00:25:43.760] You know, so this is crazy impactful technology.
[00:25:43.760 --> 00:25:45.520] The fact that it goes real time,
[00:25:45.520 --> 00:25:48.160] the, it's not just the image creation, right?
[00:25:48.160 --> 00:25:49.000] It's the image editing.
[00:25:49.000 --> 00:25:50.760] So we released an in painting model,
[00:25:50.760 --> 00:25:52.520] a depth to image that takes your face
[00:25:52.520 --> 00:25:55.160] and puts it into 3D and then understands all the lighting
[00:25:55.160 --> 00:25:57.200] so you can adjust your lighting dynamically.
[00:25:57.200 --> 00:26:02.600] Upscaler, so you can go to 4K to 1K, just almost real time.
[00:26:02.600 --> 00:26:04.440] Making that real time means that, you know,
[00:26:04.440 --> 00:26:06.880] you could say I want Emma to have a hat
[00:26:06.880 --> 00:26:10.480] and then I want them to have a less bushy mustache, you know,
[00:26:10.480 --> 00:26:12.240] what happens if his eyes are green
[00:26:12.240 --> 00:26:13.720] and it will do that instantly.
[00:26:13.720 --> 00:26:16.800] It removes all the barriers to creation.
[00:26:16.800 --> 00:26:18.200] I think people aren't ready for that.
[00:26:18.200 --> 00:26:19.680] I'm not ready for that.
[00:26:19.680 --> 00:26:21.660] And it's there right now.
[00:26:21.660 --> 00:26:24.440] This is also the case when you see it go consumer.
[00:26:24.440 --> 00:26:26.080] Like I said, Lenza has gone,
[00:26:26.080 --> 00:26:28.200] our kids tell us about it and stuff like that.
[00:26:28.200 --> 00:26:30.160] And there's other technology that's happening exactly
[00:26:30.160 --> 00:26:32.320] at the same time that will be just as disruptive
[00:26:32.320 --> 00:26:33.920] like chat, GPT, et cetera.
[00:26:33.920 --> 00:26:35.640] - Yeah, I want to get into the chat side of it.
[00:26:35.640 --> 00:26:39.240] But while we're talking about image and video and all that,
[00:26:39.240 --> 00:26:42.560] I guess I'll give you an opportunity to wax poetic
[00:26:42.560 --> 00:26:46.560] and maybe a little philosophically about the implications
[00:26:46.560 --> 00:26:50.560] of what this actually means and what creativity,
[00:26:50.560 --> 00:26:52.920] 'cause I assume anyone listening to this,
[00:26:52.920 --> 00:26:56.160] like some of the things of the ability to make a movie
[00:26:56.160 --> 00:26:59.360] in real time about the two of us having coffee
[00:26:59.360 --> 00:27:01.800] and change your bus dash and all that,
[00:27:01.800 --> 00:27:03.800] it's a great tangible example,
[00:27:03.800 --> 00:27:07.640] but maybe hard to understand why that matters.
[00:27:07.640 --> 00:27:09.760] And obviously there's more tangible cases
[00:27:09.760 --> 00:27:12.800] of the ability to green screen out backgrounds
[00:27:12.800 --> 00:27:16.760] with a serial killer example in the movie example you gave.
[00:27:16.760 --> 00:27:19.600] But can you talk a little bit about like the power
[00:27:19.600 --> 00:27:23.760] of creativity and imagery and what you think
[00:27:23.760 --> 00:27:26.480] this unlocks for people?
[00:27:26.480 --> 00:27:28.600] - Yeah, look, my mom sends me memes every day now
[00:27:28.600 --> 00:27:30.680] using this technology about why I don't call her
[00:27:30.680 --> 00:27:33.320] and my guilt levels have gone up massively.
[00:27:33.320 --> 00:27:35.520] You know, it allows people just to create anything
[00:27:35.520 --> 00:27:38.040] and the value of creativity versus consumption
[00:27:38.040 --> 00:27:39.120] can't be underestimated.
[00:27:39.120 --> 00:27:40.720] Like one of the most effective therapies
[00:27:40.720 --> 00:27:43.080] for mental health is art therapy for a reason
[00:27:43.080 --> 00:27:45.400] because this is communication.
[00:27:45.400 --> 00:27:48.320] And how many people listening to this right now
[00:27:48.320 --> 00:27:49.960] believe they can create?
[00:27:49.960 --> 00:27:52.600] Probably very few, but the reality is that everyone can,
[00:27:52.600 --> 00:27:54.200] but they don't have the tools to
[00:27:54.200 --> 00:27:55.120] and they have barriers to it.
[00:27:55.120 --> 00:27:57.400] Those barriers will be removed as of next year.
[00:27:57.400 --> 00:27:59.120] You'll be able to create anything you can imagine,
[00:27:59.120 --> 00:28:02.640] first in 2D, then in audio, then in 3D, then in video.
[00:28:02.640 --> 00:28:04.520] And then you'll be wanting to share stories.
[00:28:04.520 --> 00:28:06.160] I don't think it'll be like you remember that
[00:28:06.160 --> 00:28:08.560] she's seen in Wall-E where he's got like that VR headset
[00:28:08.560 --> 00:28:11.400] and he's all fat and stuff and everyone's in their own world.
[00:28:11.400 --> 00:28:12.240] I don't think that's the case.
[00:28:12.240 --> 00:28:14.680] People like sharing with our stories driven narrative creatures
[00:28:14.680 --> 00:28:16.560] and this allows us to tell more stories.
[00:28:16.560 --> 00:28:18.840] And I think it gives people agency
[00:28:18.840 --> 00:28:21.800] because, you know, again, like I've done lots of art therapy
[00:28:21.800 --> 00:28:24.640] with people, like it really improves their lives.
[00:28:24.640 --> 00:28:26.560] It improves your life when you are creating
[00:28:26.560 --> 00:28:28.080] no matter what it is.
[00:28:28.080 --> 00:28:30.320] And then again, too few of us believe that we can.
[00:28:30.320 --> 00:28:31.640] We lose that childhood joy, right?
[00:28:31.640 --> 00:28:34.160] Like when you're a kid, of course you can create.
[00:28:34.160 --> 00:28:35.960] Then you get to your teenage years and you're like,
[00:28:35.960 --> 00:28:37.120] "Ah, those people are better than me.
[00:28:37.120 --> 00:28:38.520] I don't have time to do that."
[00:28:38.520 --> 00:28:40.160] And then it moves away from that.
[00:28:40.160 --> 00:28:42.440] And when you get to sad old folk like us,
[00:28:42.440 --> 00:28:45.560] it's like, I have to learn to draw or paint
[00:28:45.560 --> 00:28:46.400] or something like that.
[00:28:46.400 --> 00:28:49.040] You do that on a holiday and then it's really rewarding.
[00:28:49.040 --> 00:28:51.280] But too many people don't have access to that.
[00:28:51.280 --> 00:28:53.160] I think, again, we've made that happen.
[00:28:53.160 --> 00:28:56.240] So I think creation beats consumption
[00:28:56.240 --> 00:28:58.280] and now everyone can create.
[00:28:58.280 --> 00:29:00.400] And so I think the world will be happier.
[00:29:00.400 --> 00:29:02.440] Some people are gonna use it in a douchebag way
[00:29:02.440 --> 00:29:04.040] but that's why we live in a society
[00:29:04.040 --> 00:29:06.160] that has retigants against this.
[00:29:06.160 --> 00:29:08.840] And this technology is pretty inevitable.
[00:29:08.840 --> 00:29:10.280] It's going now.
[00:29:10.280 --> 00:29:13.080] And again, exponentials are a hell of a thing.
[00:29:13.080 --> 00:29:14.560] So we've got to get used to it
[00:29:14.560 --> 00:29:15.720] where it impacts our industries
[00:29:15.720 --> 00:29:17.400] and we're gonna take advantage of it
[00:29:17.400 --> 00:29:19.200] where it can make our lives better.
[00:29:19.200 --> 00:29:21.680] - The inevitability of a lot of things you're talking about
[00:29:21.680 --> 00:29:25.000] is increased productivity, increased efficiency,
[00:29:25.000 --> 00:29:27.640] increased creativity, right?
[00:29:27.640 --> 00:29:30.920] The flip side of any big gain in productivity
[00:29:30.920 --> 00:29:35.600] is a loss of potential jobs
[00:29:35.600 --> 00:29:38.240] or existing skill sets or responsibilities
[00:29:38.240 --> 00:29:39.080] that people have.
[00:29:39.080 --> 00:29:42.000] In your case, if you speed something up 75%,
[00:29:42.000 --> 00:29:43.680] there's some vector of time.
[00:29:43.680 --> 00:29:45.600] And so we're giving back time to people.
[00:29:45.600 --> 00:29:48.560] But there's gonna be inevitability of
[00:29:48.560 --> 00:29:51.200] some people are gonna probably lose their jobs
[00:29:51.200 --> 00:29:52.920] over these types of things.
[00:29:52.920 --> 00:29:56.400] How do you think about that in productivity gains
[00:29:56.400 --> 00:30:00.760] versus some of the things that could create job loss?
[00:30:00.760 --> 00:30:03.240] - Yeah, so this has been the nature of technology, right?
[00:30:03.240 --> 00:30:05.240] Technology is always a productivity gains.
[00:30:05.240 --> 00:30:08.040] And yet here we are at full employment today.
[00:30:08.040 --> 00:30:09.640] Like I believe the number of photographers
[00:30:09.640 --> 00:30:12.680] has increased by 25% over the last three years.
[00:30:12.680 --> 00:30:15.040] iPhones are really good at photographs now.
[00:30:15.040 --> 00:30:16.360] But yeah, there's still a job there.
[00:30:16.360 --> 00:30:18.880] This is the thing I said about AlphaGo and LisaDoll.
[00:30:18.880 --> 00:30:20.120] The average level of Go players
[00:30:20.120 --> 00:30:22.560] has gone up exponentially over the last few years.
[00:30:22.560 --> 00:30:24.000] So I think this is augmenting technology
[00:30:24.000 --> 00:30:26.080] as opposed to replacing technology.
[00:30:26.080 --> 00:30:28.000] But there are certain areas where you have to consider
[00:30:28.000 --> 00:30:29.560] what is the future of the rendering industry
[00:30:29.560 --> 00:30:30.440] and other things like that
[00:30:30.440 --> 00:30:32.560] when you can automatically generate any type of asset
[00:30:32.560 --> 00:30:33.960] or much real time.
[00:30:33.960 --> 00:30:35.160] You know?
[00:30:35.160 --> 00:30:37.160] So I think it will create and it will destroy.
[00:30:37.160 --> 00:30:38.440] And this is the nature of technology.
[00:30:38.440 --> 00:30:40.440] Again, the technology is not neutral.
[00:30:40.440 --> 00:30:43.480] Technology is kind of this on-watching thing.
[00:30:43.480 --> 00:30:45.680] And it will never be completely positive.
[00:30:45.680 --> 00:30:50.320] So yeah, this is why I think the important thing
[00:30:50.320 --> 00:30:52.840] is given the pace of this adoption of this technology
[00:30:52.840 --> 00:30:55.000] versus any technology that I saw
[00:30:55.000 --> 00:30:58.600] when I was a hedge fund manager or I was a VC for a bit,
[00:30:58.600 --> 00:30:59.760] people just got to get used to it
[00:30:59.760 --> 00:31:00.920] and they got to really understand it
[00:31:00.920 --> 00:31:03.040] because it's something quite alien
[00:31:03.040 --> 00:31:04.320] and massively impactful.
[00:31:04.320 --> 00:31:05.160] - I could make the,
[00:31:05.160 --> 00:31:07.400] or I think people have made the case
[00:31:07.400 --> 00:31:09.800] that this is so powerful,
[00:31:09.800 --> 00:31:13.640] that there's negative ramifications,
[00:31:13.640 --> 00:31:16.320] the likes of which we don't even know
[00:31:16.320 --> 00:31:21.040] what the implications of this are from a societal standpoint.
[00:31:21.040 --> 00:31:22.960] And I think we've seen some people call
[00:31:22.960 --> 00:31:25.960] for all these things to be pulled back
[00:31:25.960 --> 00:31:27.800] until we have a better understanding
[00:31:27.800 --> 00:31:30.560] and can put in the right constraints and controls.
[00:31:30.560 --> 00:31:32.920] I get the feeling you have a much more optimistic
[00:31:32.920 --> 00:31:34.720] take about human nature
[00:31:34.720 --> 00:31:39.400] and also how these things sort out over time.
[00:31:39.400 --> 00:31:41.840] Can you just talk a little bit about like your perspective
[00:31:41.840 --> 00:31:46.840] on the trade off between openness and accessibility
[00:31:46.840 --> 00:31:52.040] with negative unintended consequences in general?
[00:31:52.040 --> 00:31:53.920] - Yeah, well, so powerful technologies can do anything
[00:31:53.920 --> 00:31:56.160] 'cause these models are general few shot learners
[00:31:56.160 --> 00:31:58.080] as they can learn a little and then they can just do
[00:31:58.080 --> 00:31:59.240] just about anything.
[00:31:59.240 --> 00:32:00.920] It's a very powerful one, right?
[00:32:00.920 --> 00:32:03.280] But like, I was thinking about people saying that,
[00:32:03.280 --> 00:32:05.680] I was like, so why do they want Indians or Africans
[00:32:05.680 --> 00:32:07.480] to have this technology?
[00:32:07.480 --> 00:32:10.480] Because actually it's an inherently colonialist
[00:32:10.480 --> 00:32:12.600] kind of racist way to look at the world
[00:32:12.600 --> 00:32:14.760] because when you ask them, it's when they educate them enough
[00:32:14.760 --> 00:32:16.160] or they don't know better.
[00:32:16.160 --> 00:32:19.320] Because there is this thing where there's this like guys
[00:32:19.320 --> 00:32:20.560] that tech people know better,
[00:32:20.560 --> 00:32:23.160] but then it's like nobody elected them.
[00:32:23.160 --> 00:32:24.160] So they're self-appointed.
[00:32:24.160 --> 00:32:25.160] So what's the answer to this?
[00:32:25.160 --> 00:32:28.720] The answer for me was to move towards open,
[00:32:28.720 --> 00:32:30.560] but that widens the discussion.
[00:32:30.560 --> 00:32:32.200] So how many developers now are developing
[00:32:32.200 --> 00:32:33.440] on stable diffusion?
[00:32:33.440 --> 00:32:34.600] Millions.
[00:32:34.600 --> 00:32:35.760] They're all voices now
[00:32:35.760 --> 00:32:38.000] and they're people who weren't developing before.
[00:32:38.000 --> 00:32:39.640] How many governments are talking about it,
[00:32:39.640 --> 00:32:41.080] just about all of them?
[00:32:41.080 --> 00:32:43.640] All of the media studios are talking about it now
[00:32:43.640 --> 00:32:45.240] and it's in the public sphere.
[00:32:45.240 --> 00:32:47.560] And there will be policy debates on this.
[00:32:47.560 --> 00:32:49.080] And that's a good thing.
[00:32:49.080 --> 00:32:51.080] You know, again, we have mechanisms in society
[00:32:51.080 --> 00:32:53.440] to decide about these technologies and other things.
[00:32:53.440 --> 00:32:54.920] And I think the overwhelming output
[00:32:54.920 --> 00:32:56.680] of stable diffusion has been good.
[00:32:56.680 --> 00:32:58.800] Like four channel places like that have had this
[00:32:58.800 --> 00:33:01.680] for months now, nothing bad has really come of it.
[00:33:01.680 --> 00:33:03.320] You know, we've had technologies to deep fix
[00:33:03.320 --> 00:33:04.760] and other things, but people haven't realized
[00:33:04.760 --> 00:33:06.480] that it's quite so easy to do.
[00:33:06.480 --> 00:33:07.920] Now they do realize.
[00:33:07.920 --> 00:33:09.960] So my thing has been about,
[00:33:09.960 --> 00:33:12.160] I think it's unethical to control access
[00:33:12.160 --> 00:33:13.600] to powerful technology.
[00:33:13.600 --> 00:33:15.840] This kind of echoes as well the cryptography debates
[00:33:15.840 --> 00:33:17.520] that we had decades ago,
[00:33:17.520 --> 00:33:20.200] whereas like bad guys can use this to do bad things.
[00:33:20.200 --> 00:33:22.520] What would happen if you'd have mathematics outlawed
[00:33:22.520 --> 00:33:24.000] like back then?
[00:33:24.000 --> 00:33:25.600] We wouldn't have cryptography saving us
[00:33:25.600 --> 00:33:27.840] from the bad guys, you know?
[00:33:27.840 --> 00:33:30.120] So I think there's a lot of red teaming
[00:33:30.120 --> 00:33:31.160] because it is dangerous
[00:33:31.160 --> 00:33:33.280] and because only big corporations could do this
[00:33:33.280 --> 00:33:35.480] and they were too afraid to release it.
[00:33:35.480 --> 00:33:37.240] I think there was enough green teaming
[00:33:37.240 --> 00:33:38.880] of what could be the positives from this.
[00:33:38.880 --> 00:33:39.880] 'Cause like I said,
[00:33:39.880 --> 00:33:41.800] how many people in the world can create next year
[00:33:41.800 --> 00:33:43.520] who couldn't create because of us?
[00:33:43.520 --> 00:33:46.320] Hundreds of millions.
[00:33:46.320 --> 00:33:49.240] And that is a net benefit and happiness to society.
[00:33:49.240 --> 00:33:51.760] But that is a reflection of anyone's bottom lines, right?
[00:33:51.760 --> 00:33:53.600] Well, maybe ours, but you know what I mean.
[00:33:53.600 --> 00:33:55.560] So I think this is a very complicated thing
[00:33:55.560 --> 00:33:57.760] and people talk about it in three terms.
[00:33:57.760 --> 00:33:58.720] There's ethics.
[00:33:58.720 --> 00:34:01.560] And ethics is my personal and your personal.
[00:34:01.560 --> 00:34:03.600] I don't think anyone has a right
[00:34:03.600 --> 00:34:05.360] to call anyone else unethical
[00:34:05.360 --> 00:34:07.280] unless they really understand that person.
[00:34:07.280 --> 00:34:09.560] Morals are what we decide as society.
[00:34:09.560 --> 00:34:12.040] But my morals in the UK are different to your morals
[00:34:12.040 --> 00:34:15.160] and the US are different to Indian and Chinese morals.
[00:34:15.160 --> 00:34:16.680] And then finally, there's legal.
[00:34:16.680 --> 00:34:18.360] And legal are these codifications
[00:34:18.360 --> 00:34:21.040] of moral boundaries that we kind of put in.
[00:34:21.040 --> 00:34:23.040] And we need to catch up on all of these.
[00:34:23.040 --> 00:34:25.120] There's one final point I'd like to make is the alternative
[00:34:25.120 --> 00:34:26.800] is that this technology is the preserve
[00:34:26.800 --> 00:34:30.080] of large companies who mostly focus on ads.
[00:34:30.080 --> 00:34:33.000] And it's really persuasive this technology.
[00:34:33.000 --> 00:34:36.360] So like we've got human realistic emotional voices
[00:34:36.360 --> 00:34:38.080] and faces and other things.
[00:34:38.080 --> 00:34:39.480] What would happen if your chat assistant
[00:34:39.480 --> 00:34:42.320] started whispering at you to buy something?
[00:34:42.320 --> 00:34:44.400] What's the regulation legislation around that?
[00:34:44.400 --> 00:34:46.640] What's the legislation around really large language models
[00:34:46.640 --> 00:34:48.400] as opposed to our tiny models
[00:34:48.400 --> 00:34:50.320] that can get to human level
[00:34:50.320 --> 00:34:52.840] and should only big tech be allowed to use that?
[00:34:52.840 --> 00:34:54.680] I think it should all be regulated.
[00:34:54.680 --> 00:34:56.720] I don't know what that regulation should be.
[00:34:56.720 --> 00:34:58.040] And I will give my voice to it.
[00:34:58.040 --> 00:35:00.720] And I hope more people give their voices to it
[00:35:00.720 --> 00:35:03.560] instead of it being just decided behind closed doors.
[00:35:03.560 --> 00:35:05.560] So it's quite a complex thing.
[00:35:05.560 --> 00:35:07.840] I don't think there's any governance structure currently
[00:35:07.840 --> 00:35:10.520] that I've seen that can handle it.
[00:35:10.520 --> 00:35:11.560] I think that we should work together
[00:35:11.560 --> 00:35:13.400] to put these things in place.
[00:35:13.400 --> 00:35:15.840] Now, the state of the industry today,
[00:35:15.840 --> 00:35:18.560] there's obviously the large companies
[00:35:18.560 --> 00:35:22.600] that have some advertising related business, Google,
[00:35:22.600 --> 00:35:25.000] Facebook, et cetera, Microsoft.
[00:35:25.000 --> 00:35:28.560] And then there's the private companies
[00:35:28.560 --> 00:35:30.800] kind of going after this opportunity.
[00:35:30.800 --> 00:35:33.880] In my mind, there's businesses like,
[00:35:33.880 --> 00:35:38.160] we talked about runway or mid-journey or some of those,
[00:35:38.160 --> 00:35:40.600] but that the two fundamental companies
[00:35:40.600 --> 00:35:41.840] that seems going after this,
[00:35:41.840 --> 00:35:43.840] or at least when I talk to people I hear about,
[00:35:43.840 --> 00:35:46.320] are you all and then OpenAI,
[00:35:46.320 --> 00:35:48.160] who we've referenced before.
[00:35:48.160 --> 00:35:51.800] How do you contrast your style
[00:35:51.800 --> 00:35:54.760] and approach to what they're doing?
[00:35:54.760 --> 00:35:57.040] So, yeah, I think OpenAI and us
[00:35:57.040 --> 00:35:58.920] are the only independent multi-modal companies.
[00:35:58.920 --> 00:36:02.480] Multi-modal mean that we do all of the types of models, right?
[00:36:02.480 --> 00:36:05.280] Again, runway, do a fantastic work around video, mid-journey.
[00:36:05.280 --> 00:36:08.640] David is super focused on kind of images
[00:36:08.640 --> 00:36:11.040] and video games in the future that are streamed.
[00:36:11.040 --> 00:36:12.680] That'll be super cool.
[00:36:12.680 --> 00:36:15.120] We're kind of foundation layer companies as it were.
[00:36:15.120 --> 00:36:16.320] So we're building the building blocks
[00:36:16.320 --> 00:36:17.640] that make this accessible.
[00:36:17.640 --> 00:36:20.640] OpenAI kind of emerged Elon Musk
[00:36:20.640 --> 00:36:22.560] and a whole bunch of others wanting to build
[00:36:22.560 --> 00:36:26.040] an open nonprofit for getting to AGI,
[00:36:26.040 --> 00:36:28.840] this AI that can do anything to augment human potential.
[00:36:28.840 --> 00:36:31.920] Had to do ups and downs and change over the years,
[00:36:31.920 --> 00:36:33.280] but it's doing amazing work right now,
[00:36:33.280 --> 00:36:36.040] but their objective is that generalized intelligence
[00:36:36.040 --> 00:36:38.080] and their model has moved more close now
[00:36:38.080 --> 00:36:39.280] because they think it's dangerous.
[00:36:39.280 --> 00:36:40.880] Although they've released amazing open-source stuff,
[00:36:40.880 --> 00:36:42.680] so they just released a good tokenizer,
[00:36:42.680 --> 00:36:44.360] there is Whisper, which is one of the best things
[00:36:44.360 --> 00:36:47.000] to turn this podcast into text, et cetera.
[00:36:47.000 --> 00:36:48.600] So, yeah, they're selective about that
[00:36:48.600 --> 00:36:50.280] and that's a prerogative.
[00:36:50.280 --> 00:36:52.640] Their model is data to the models.
[00:36:52.640 --> 00:36:54.640] So you can fine-tune their GPT,
[00:36:54.640 --> 00:36:55.560] which is a language model,
[00:36:55.560 --> 00:36:57.240] and so I'm sure their image model,
[00:36:57.240 --> 00:36:59.320] hopefully they'll open-source GPT-3 as well,
[00:36:59.320 --> 00:37:01.640] so, value three, which is their new image model,
[00:37:01.640 --> 00:37:03.200] now that we've shown that it's relatively safe,
[00:37:03.200 --> 00:37:04.760] 'cause they wanna make their models better.
[00:37:04.760 --> 00:37:06.120] And then they have a deal with Microsoft
[00:37:06.120 --> 00:37:08.560] where Microsoft commercializes their models
[00:37:08.560 --> 00:37:12.640] and then funds them, which is a great kind of partnership.
[00:37:12.640 --> 00:37:13.640] Our model's a bit different
[00:37:13.640 --> 00:37:16.480] and the our model is models to the data,
[00:37:16.480 --> 00:37:18.280] whereby we're creating open-source models
[00:37:18.280 --> 00:37:20.000] that you can take onto your own code base
[00:37:20.000 --> 00:37:21.840] or your own asset images,
[00:37:21.840 --> 00:37:24.720] and we've teamed up with AWS SageMaker for this,
[00:37:24.720 --> 00:37:27.560] and then you can customize it to your own experiences.
[00:37:27.560 --> 00:37:29.480] We don't really care about taking customer data
[00:37:29.480 --> 00:37:31.080] and using it to improve our own models.
[00:37:31.080 --> 00:37:34.320] Instead, we're like scale and service is the way.
[00:37:34.320 --> 00:37:36.960] You know, so if you wanna customize version of the model,
[00:37:36.960 --> 00:37:38.600] the best people to come to is us
[00:37:38.600 --> 00:37:40.080] and it'll be a million dollars a pop
[00:37:40.080 --> 00:37:42.080] to train those things up, you know,
[00:37:42.080 --> 00:37:43.560] actually more in some cases.
[00:37:43.560 --> 00:37:46.000] If you wanna scale it, scaling these models is hard,
[00:37:46.000 --> 00:37:47.800] but we can scale your customized models for you
[00:37:47.800 --> 00:37:50.360] and again, we will have a fair deal on that one.
[00:37:50.360 --> 00:37:52.080] So even though we overlap in certain areas,
[00:37:52.080 --> 00:37:53.600] I think we have very different philosophies
[00:37:53.600 --> 00:37:55.840] 'cause also our philosophy is getting this AI out to everyone
[00:37:55.840 --> 00:37:58.000] and having everyone have their own personalized models
[00:37:58.000 --> 00:38:00.240] versus building an AI that can do anything.
[00:38:00.240 --> 00:38:02.200] And I think it's quite complimentary as well
[00:38:02.200 --> 00:38:05.120] 'cause you'll always have a Windows, Linux, you know,
[00:38:05.120 --> 00:38:07.440] kind of thing going on there,
[00:38:07.440 --> 00:38:09.320] Oracle, MySQL, et cetera.
[00:38:09.320 --> 00:38:13.440] So yeah, so I think that hopefully kind of finds it.
[00:38:13.440 --> 00:38:15.400] I think the final thing as well is that their focus
[00:38:15.400 --> 00:38:16.480] has been on language models,
[00:38:16.480 --> 00:38:19.600] such as the amazing chat GPT they've just released,
[00:38:19.600 --> 00:38:23.400] with some image, whereas our focus is on media models
[00:38:23.400 --> 00:38:27.000] with some elements of language and others.
[00:38:27.000 --> 00:38:28.840] I think in the respective spaces,
[00:38:28.840 --> 00:38:30.720] there's a lot of people who are quite amazing
[00:38:30.720 --> 00:38:32.360] doing language models.
[00:38:32.360 --> 00:38:35.800] I think we're the only ones doing media models at scale.
[00:38:35.800 --> 00:38:37.480] - And inherent in all of that,
[00:38:37.480 --> 00:38:41.880] the approach and the philosophy is the open source nature
[00:38:41.880 --> 00:38:44.240] of what you guys are doing, right?
[00:38:44.240 --> 00:38:47.040] And as you think about the business model
[00:38:47.040 --> 00:38:48.080] around the open source,
[00:38:48.080 --> 00:38:51.800] you touched on the service and management
[00:38:51.800 --> 00:38:54.040] that you can have with different customers.
[00:38:54.040 --> 00:38:57.600] But how do you think about what the ultimate business model
[00:38:57.600 --> 00:39:00.520] for stability AI is going to be over time?
[00:39:00.520 --> 00:39:01.960] Does it stay around that?
[00:39:01.960 --> 00:39:03.400] You sell different applications
[00:39:03.400 --> 00:39:05.320] on top of the existing primitives.
[00:39:05.320 --> 00:39:08.400] What do you think the end state of this is for you?
[00:39:08.400 --> 00:39:10.000] - We're fully vertically integrated.
[00:39:10.000 --> 00:39:11.120] So we have our own products,
[00:39:11.120 --> 00:39:14.560] we in Dream Studio Pro that we're releasing in January,
[00:39:14.560 --> 00:39:16.560] where you can generate in time movies
[00:39:16.560 --> 00:39:18.600] and storyboarding and 3D cameras
[00:39:18.600 --> 00:39:20.400] and audio integration with our audio models
[00:39:20.400 --> 00:39:21.320] and things like that.
[00:39:21.320 --> 00:39:22.520] 'Cause the models need to be used.
[00:39:22.520 --> 00:39:25.200] We've also got integrations into Photoshop
[00:39:25.200 --> 00:39:27.720] and all the other kind of interfaces as well,
[00:39:27.720 --> 00:39:30.880] where you can use our services and custom models soon.
[00:39:30.880 --> 00:39:32.520] So I think that's a really nice place to be.
[00:39:32.520 --> 00:39:34.200] It's the layer one for AI.
[00:39:34.200 --> 00:39:35.680] And you know, we support the whole sector.
[00:39:35.680 --> 00:39:38.360] So when certain API companies who aren't us
[00:39:38.360 --> 00:39:40.520] got stuck on GPUs, we unstuck it.
[00:39:40.520 --> 00:39:41.640] When mid-journey was going,
[00:39:41.640 --> 00:39:44.800] we gave them a small grant to get going, to do the beta.
[00:39:44.800 --> 00:39:45.760] 'Cause we thought it was amazing
[00:39:45.760 --> 00:39:47.560] to have this technology out there.
[00:39:47.560 --> 00:39:48.840] So we really view ourselves
[00:39:48.840 --> 00:39:50.960] with that infrastructure layer, picks and shovels,
[00:39:50.960 --> 00:39:53.680] as it were, and then other people build on top of what we do.
[00:39:53.680 --> 00:39:55.960] You come to us, if you wanna have
[00:39:55.960 --> 00:39:57.560] the vertically integrated best people
[00:39:57.560 --> 00:39:59.120] in the world working with you.
[00:39:59.120 --> 00:40:00.600] And every media company in the world
[00:40:00.600 --> 00:40:02.400] and video game company needs that.
[00:40:02.400 --> 00:40:04.560] And there's no other alternative.
[00:40:04.560 --> 00:40:07.280] Because we build the models, we put them out there,
[00:40:07.280 --> 00:40:10.560] and we also make them usable through our software.
[00:40:10.560 --> 00:40:11.960] But, you know, we're not gonna have
[00:40:11.960 --> 00:40:13.160] a huge number of customers.
[00:40:13.160 --> 00:40:14.640] It's gonna be very selective,
[00:40:14.640 --> 00:40:16.360] similar to a Palantir type thing.
[00:40:16.360 --> 00:40:20.040] And then the tail is where we collaborate with partners
[00:40:20.040 --> 00:40:23.080] like AWS, who make our models available for everyone.
[00:40:23.080 --> 00:40:24.400] And we'll have more and more services
[00:40:24.400 --> 00:40:26.200] around those across modalities.
[00:40:26.200 --> 00:40:29.040] - How do you think about the difficulty
[00:40:29.040 --> 00:40:33.840] of model and technology between image and text?
[00:40:33.840 --> 00:40:37.600] So obviously there's differences between Dolly 2
[00:40:37.600 --> 00:40:39.320] and Stable Diffusion.
[00:40:39.320 --> 00:40:41.040] But I know you guys are working
[00:40:41.040 --> 00:40:42.280] on a bunch of different models.
[00:40:42.280 --> 00:40:46.360] And is the difficulty of making language work harder
[00:40:46.360 --> 00:40:49.720] than image as you think about the problem set?
[00:40:49.720 --> 00:40:51.720] Are they roughly the same?
[00:40:51.720 --> 00:40:53.240] - They're roughly the same.
[00:40:53.240 --> 00:40:55.160] Language is a lot more semantically dense,
[00:40:55.160 --> 00:40:56.840] which is why the language models are a lot bigger.
[00:40:56.840 --> 00:40:59.800] So it's incredibly difficult to get them to work on the edge.
[00:40:59.800 --> 00:41:02.440] With Stable Diffusion, we announced Distill Stable Diffusion,
[00:41:02.440 --> 00:41:04.120] which is a 20 times speed up,
[00:41:04.120 --> 00:41:06.040] which will mean that we've spent up a hundred times
[00:41:06.040 --> 00:41:07.960] since launch in August.
[00:41:07.960 --> 00:41:09.320] That means it'll work in one second
[00:41:09.320 --> 00:41:11.840] on an iPhone without internet, maybe two seconds.
[00:41:11.840 --> 00:41:12.680] That's insane.
[00:41:12.680 --> 00:41:14.720] Language models cannot do that.
[00:41:14.720 --> 00:41:17.640] But language models, you can make great.
[00:41:17.640 --> 00:41:19.600] So like I said, Chatsh EPT is an example
[00:41:19.600 --> 00:41:22.320] that's hitting all of the press around that right now.
[00:41:22.320 --> 00:41:25.920] So I think this challenge of getting accessible
[00:41:25.920 --> 00:41:27.720] is gonna be the big one for us.
[00:41:27.720 --> 00:41:29.240] And again, this is kind of our focus
[00:41:29.240 --> 00:41:30.680] versus everyone else.
[00:41:30.680 --> 00:41:32.520] Nobody else in the industry is focused on
[00:41:32.520 --> 00:41:35.400] how do you get these things working on mobile?
[00:41:35.400 --> 00:41:36.720] Because that's not their prerogative.
[00:41:36.720 --> 00:41:39.640] Or how do you build an Indian version of this, et cetera?
[00:41:39.640 --> 00:41:42.800] - And so do you think that specialized models
[00:41:42.800 --> 00:41:45.880] will exist and thrive?
[00:41:45.880 --> 00:41:50.000] Or do you think eventually these converge
[00:41:50.000 --> 00:41:52.080] on large multimodal models?
[00:41:52.080 --> 00:41:53.480] I guess in other words, like,
[00:41:53.480 --> 00:41:55.400] will there be significant leading models
[00:41:55.400 --> 00:41:58.920] for just text and language and just images and proteins?
[00:42:00.280 --> 00:42:03.800] Or will there be one large model that's most effective?
[00:42:03.800 --> 00:42:04.840] - I think it'll be a mixture.
[00:42:04.840 --> 00:42:06.920] I think every, if you kind of look at it,
[00:42:06.920 --> 00:42:08.600] what happened is the model's got bigger and bigger
[00:42:08.600 --> 00:42:10.600] and bigger, trillions of parts.
[00:42:10.600 --> 00:42:11.440] - Great.
[00:42:11.440 --> 00:42:13.200] - Unwieldy and unaccessible for anyone.
[00:42:13.200 --> 00:42:16.080] Then it turned out that actually we weren't paying enough
[00:42:16.080 --> 00:42:17.320] attention to the data.
[00:42:17.320 --> 00:42:19.560] So DeepMind released something called Chinchilla,
[00:42:19.560 --> 00:42:23.720] which was a version of GPT-3, which is 175 billion parameters,
[00:42:23.720 --> 00:42:26.680] but in 67 billion parameters, the app performed it.
[00:42:26.680 --> 00:42:28.240] 'Cause they just trained it longer.
[00:42:28.240 --> 00:42:30.080] But if you look at the actual import of that paper,
[00:42:30.080 --> 00:42:33.600] it was just you need better data and training longer.
[00:42:33.600 --> 00:42:35.440] So we don't really know how to optimize these models
[00:42:35.440 --> 00:42:38.240] 'cause they were so big and so compute intensive,
[00:42:38.240 --> 00:42:39.640] 'cause they cost millions of dollars each
[00:42:39.640 --> 00:42:42.400] that we didn't really see what the differentials are
[00:42:42.400 --> 00:42:43.880] from data from training and others.
[00:42:43.880 --> 00:42:45.720] There's a lot of model optimization to go.
[00:42:45.720 --> 00:42:47.040] One of the big breakthroughs now though,
[00:42:47.040 --> 00:42:49.440] is that we've moved from just deep learning.
[00:42:49.440 --> 00:42:52.120] So these big supercomputers squishing the data,
[00:42:52.120 --> 00:42:54.240] moving it back and forth across all these ships,
[00:42:54.240 --> 00:42:56.120] to now introducing reinforcement learning
[00:42:56.120 --> 00:42:57.520] with human feedback.
[00:42:57.520 --> 00:42:58.960] That's where you see how these models
[00:42:58.960 --> 00:43:00.440] have all the little neurons,
[00:43:00.440 --> 00:43:01.800] which are these principles they've learned,
[00:43:01.800 --> 00:43:04.600] light up, when humans actually interact with them.
[00:43:04.600 --> 00:43:06.280] And you use that to create more specific,
[00:43:06.280 --> 00:43:07.680] optimized models.
[00:43:07.680 --> 00:43:11.640] So OpenAI, again, like the leaders in this field,
[00:43:11.640 --> 00:43:15.320] they created and struct GPT by figuring that out
[00:43:15.320 --> 00:43:16.320] from GPT-3.
[00:43:16.320 --> 00:43:18.280] It went from 175 billion parameters
[00:43:18.280 --> 00:43:22.280] to 1.3 billion parameters, with just as much performance.
[00:43:22.280 --> 00:43:23.800] And this is one of the things that I think
[00:43:23.800 --> 00:43:26.040] will really drive it, the combination of deep learning
[00:43:26.040 --> 00:43:27.880] and understanding how humans interact with these models
[00:43:27.880 --> 00:43:29.600] to make better models,
[00:43:29.600 --> 00:43:33.040] and to get better data to build better models again.
[00:43:33.040 --> 00:43:35.640] We're reaching that point now of rapid integration and feedback,
[00:43:35.640 --> 00:43:37.440] and that's what we saw with stable diffusion.
[00:43:37.440 --> 00:43:39.800] We've got 100 times speed up.
[00:43:39.800 --> 00:43:41.560] So it's 100 times faster than Dali2,
[00:43:41.560 --> 00:43:43.640] as I would like now, basically.
[00:43:43.640 --> 00:43:45.640] We'll release it shortly.
[00:43:45.640 --> 00:43:47.080] But there's more to go.
[00:43:47.080 --> 00:43:48.720] And so I think, again,
[00:43:48.720 --> 00:43:50.400] now this really makes sense, to be honest,
[00:43:50.400 --> 00:43:53.720] Logan, like the fact that a 1.6 gigabyte file
[00:43:53.720 --> 00:43:56.240] can contain two billion different concepts
[00:43:56.240 --> 00:43:57.760] and create just about anything now,
[00:43:57.760 --> 00:44:01.000] pretty much photo realistic, doesn't make sense.
[00:44:01.000 --> 00:44:02.120] It's insane.
[00:44:02.120 --> 00:44:02.960] But it's there.
[00:44:02.960 --> 00:44:04.800] And the fact that it can run on iPhone or that internet
[00:44:04.800 --> 00:44:06.240] doesn't make sense.
[00:44:06.240 --> 00:44:08.240] You know, it's orders of magnitude better.
[00:44:08.240 --> 00:44:09.600] And the question is,
[00:44:09.600 --> 00:44:11.400] how are we gonna react to this when it happens
[00:44:11.400 --> 00:44:12.920] for every modality?
[00:44:12.920 --> 00:44:14.640] 'Cause some modalities are more difficult than others,
[00:44:14.640 --> 00:44:16.800] like 3D is very difficult.
[00:44:16.800 --> 00:44:18.680] But maybe they'll figure out how to make it easy.
[00:44:18.680 --> 00:44:19.600] I don't know.
[00:44:19.600 --> 00:44:21.120] You know, video is difficult.
[00:44:21.120 --> 00:44:22.800] We're figuring out how to make it easier.
[00:44:22.800 --> 00:44:27.240] I would note that 80% of all AI researchers in this area
[00:44:27.920 --> 00:44:29.800] and I reckon $100 billion of investment
[00:44:29.800 --> 00:44:32.640] will go into this area to accelerate it even more.
[00:44:32.640 --> 00:44:37.280] - And as you think about your core use case and customers,
[00:44:37.280 --> 00:44:40.040] you mentioned closer to the Palantir model,
[00:44:40.040 --> 00:44:43.640] do you think it's mostly gonna be focused on media
[00:44:43.640 --> 00:44:46.400] and content are gonna be kind of the core businesses
[00:44:46.400 --> 00:44:49.400] that you service or are there other use cases
[00:44:49.400 --> 00:44:51.840] or target customers that you're excited about?
[00:44:51.840 --> 00:44:53.200] - I mean, like I think we can disrupt
[00:44:53.200 --> 00:44:55.800] the whole of pharmaceutical industry and healthcare.
[00:44:56.800 --> 00:44:58.720] And again, it's an area that we know very well
[00:44:58.720 --> 00:45:00.360] and we've got protein folding and other things,
[00:45:00.360 --> 00:45:01.600] but these models can be applied
[00:45:01.600 --> 00:45:03.760] to save billions of dollars there.
[00:45:03.760 --> 00:45:05.080] - And why is that the case?
[00:45:05.080 --> 00:45:09.560] - Well, because the whole area is kind of...
[00:45:09.560 --> 00:45:13.560] Classical systems are a girdic.
[00:45:13.560 --> 00:45:16.880] So they treat everyone like distributed normal,
[00:45:16.880 --> 00:45:17.960] like that, right?
[00:45:17.960 --> 00:45:19.560] It's like a thousand tosses of a coin
[00:45:19.560 --> 00:45:21.200] is the same as one coin toss a thousand times
[00:45:21.200 --> 00:45:22.960] that people are very individualized.
[00:45:22.960 --> 00:45:24.280] We didn't have the systems to be able
[00:45:24.280 --> 00:45:26.920] to have personalization and understanding of principles
[00:45:26.920 --> 00:45:28.320] and we've got them both at the same time now
[00:45:28.320 --> 00:45:29.560] with this technology.
[00:45:29.560 --> 00:45:31.960] When you apply this appropriately to healthcare and bio,
[00:45:31.960 --> 00:45:33.040] you have massive breakthroughs
[00:45:33.040 --> 00:45:34.400] and there are drug development companies
[00:45:34.400 --> 00:45:35.560] and others doing that,
[00:45:35.560 --> 00:45:37.080] but they're all building their own infrastructure
[00:45:37.080 --> 00:45:39.520] when they should be using a unified infrastructure.
[00:45:39.520 --> 00:45:41.960] And this is where you open source is super powerful
[00:45:41.960 --> 00:45:43.400] 'cause I'll give you an example,
[00:45:43.400 --> 00:45:45.560] by releasing our model and having the traction,
[00:45:45.560 --> 00:45:48.440] Apple optimized it for the neural engine on the M1
[00:45:48.440 --> 00:45:50.040] and the other architectures.
[00:45:50.040 --> 00:45:53.840] It's the first model ever to basically have that.
[00:45:54.320 --> 00:45:55.600] You can set the standards around this
[00:45:55.600 --> 00:45:57.560] and you can really drive forward the sectors.
[00:45:57.560 --> 00:45:59.320] And from a business perspective, again,
[00:45:59.320 --> 00:46:00.720] we are the business that does this.
[00:46:00.720 --> 00:46:02.640] There is nobody else.
[00:46:02.640 --> 00:46:03.480] That's multimodal.
[00:46:03.480 --> 00:46:06.440] There is nobody else that is media focused initially.
[00:46:06.440 --> 00:46:09.840] So media is our number one thing, media and video games.
[00:46:09.840 --> 00:46:11.480] And so if anyone wants to have the best,
[00:46:11.480 --> 00:46:12.600] they come to us.
[00:46:12.600 --> 00:46:14.200] But we can only work with a few entities,
[00:46:14.200 --> 00:46:16.200] but that's okay because we can have an entire
[00:46:16.200 --> 00:46:17.520] massive business based on that,
[00:46:17.520 --> 00:46:20.200] just like Google has a great business based on that.
[00:46:20.200 --> 00:46:22.040] You know, these are the sectors.
[00:46:22.040 --> 00:46:23.600] We can do the sectors one by one by one,
[00:46:23.600 --> 00:46:25.760] but I don't see any sector that's not affected
[00:46:25.760 --> 00:46:27.240] by this technology.
[00:46:27.240 --> 00:46:30.120] The only question is for us as a business,
[00:46:30.120 --> 00:46:32.520] who do we work with partners to go out to
[00:46:32.520 --> 00:46:35.040] with this infrastructure and who do we do ourselves?
[00:46:35.040 --> 00:46:37.560] I mean, it's kind of like what the promise of web three was
[00:46:37.560 --> 00:46:40.480] many years ago, but web three was always an economic incentive
[00:46:40.480 --> 00:46:43.400] that was trying to be bootstrapped to real life use cases
[00:46:43.400 --> 00:46:46.120] whereas this is real life use cases right now.
[00:46:46.120 --> 00:46:48.080] There's a living value right now.
[00:46:48.080 --> 00:46:49.440] I think that's why it's exploded.
[00:46:49.440 --> 00:46:51.680] And next year we'll go even bigger.
[00:46:51.680 --> 00:46:54.840] One of the interesting elements of this is it's really coming,
[00:46:54.840 --> 00:46:59.320] I mean, historically automation and AI has been assumed
[00:46:59.320 --> 00:47:02.080] to serve at the simpler levels, right?
[00:47:02.080 --> 00:47:04.560] And move their way up or the more manual levels
[00:47:04.560 --> 00:47:07.120] and move their way up the stack.
[00:47:07.120 --> 00:47:10.560] This is actually fundamentally shifting that paradigm
[00:47:10.560 --> 00:47:14.120] and coming at it from the creative,
[00:47:14.120 --> 00:47:19.120] more knowledge worker based systems and processes
[00:47:19.680 --> 00:47:23.600] and actually potentially automating away a bunch
[00:47:23.600 --> 00:47:25.680] of those different jobs.
[00:47:25.680 --> 00:47:28.600] When you think about like no industry is not going to be
[00:47:28.600 --> 00:47:31.800] impacted by this in some way.
[00:47:31.800 --> 00:47:34.120] I'm just fascinated to hear you riff on some
[00:47:34.120 --> 00:47:36.000] of the different use cases or industries.
[00:47:36.000 --> 00:47:39.680] Have you seen it applied in any ways
[00:47:39.680 --> 00:47:42.360] that were unexpected to you since this has been out
[00:47:42.360 --> 00:47:45.320] in the world in August that would be tangible
[00:47:45.320 --> 00:47:46.800] for people to hear about?
[00:47:47.920 --> 00:47:50.080] I mean, it's been applied in crazy ways.
[00:47:50.080 --> 00:47:54.560] People who use it to create 3D VR simulations instantly.
[00:47:54.560 --> 00:47:58.520] It was used to create synthetic data on lung scans
[00:47:58.520 --> 00:48:01.200] to identify cancer by Stanford AMI
[00:48:01.200 --> 00:48:03.880] because they didn't have enough data sets
[00:48:03.880 --> 00:48:06.040] and they created more data sets like the ones they did.
[00:48:06.040 --> 00:48:08.240] Yesterday there was something called refusion
[00:48:08.240 --> 00:48:12.120] whereby they took spectrograms of music
[00:48:12.120 --> 00:48:13.600] and trained it on that.
[00:48:13.600 --> 00:48:15.520] And now from those spectrograms, you know,
[00:48:15.520 --> 00:48:17.840] the little things, it can generate brand new music
[00:48:17.840 --> 00:48:22.320] of any type, which is a bit crazy.
[00:48:22.320 --> 00:48:25.040] So I think that nobody really knows
[00:48:25.040 --> 00:48:26.480] what the long-term implications of this are.
[00:48:26.480 --> 00:48:28.280] A lot of people think that it should just be words
[00:48:28.280 --> 00:48:31.560] in images out or words in text out,
[00:48:31.560 --> 00:48:32.920] but the real impact's going to come
[00:48:32.920 --> 00:48:34.880] when people really sit down and think about
[00:48:34.880 --> 00:48:37.080] which parts of the creative process,
[00:48:37.080 --> 00:48:40.400] the constructive process, my office process,
[00:48:40.400 --> 00:48:43.480] would a little bit of some sort of entity
[00:48:43.480 --> 00:48:45.280] that understands the nature between structure
[00:48:45.280 --> 00:48:47.720] and unstructured data and the barriers between that
[00:48:47.720 --> 00:48:49.840] and blurring that be super useful.
[00:48:49.840 --> 00:48:52.440] And I can't really think of many things
[00:48:52.440 --> 00:48:55.840] that aren't disrupted by that in the knowledge workspace.
[00:48:55.840 --> 00:48:58.240] In the manual workspace, it's more difficult
[00:48:58.240 --> 00:49:00.080] 'cause you needed to have robotics and things like that,
[00:49:00.080 --> 00:49:01.520] high capex, right?
[00:49:01.520 --> 00:49:05.320] There is no capex required to do this at a base level.
[00:49:05.320 --> 00:49:06.760] When you wanna build your own custom models,
[00:49:06.760 --> 00:49:08.480] yeah, it costs millions of dollars.
[00:49:08.480 --> 00:49:10.160] But only a few companies will do that,
[00:49:10.160 --> 00:49:12.160] which is why our focus is on a few companies.
[00:49:12.160 --> 00:49:15.240] And for everyone else, it's just making these models usable.
[00:49:15.240 --> 00:49:18.560] - What concerns you the most about all of this
[00:49:18.560 --> 00:49:22.840] being out in the world now from a societal impact standpoint?
[00:49:22.840 --> 00:49:24.600] - I mean, look at the unknowns, right?
[00:49:24.600 --> 00:49:26.360] Nobody really knows what's gonna happen.
[00:49:26.360 --> 00:49:29.080] The bad guys already have this technology.
[00:49:29.080 --> 00:49:31.160] Should know for a number of reasons, right?
[00:49:31.160 --> 00:49:34.040] And I don't know if we'll be able to catch up
[00:49:34.040 --> 00:49:36.840] enough with a society to the bad actors
[00:49:36.840 --> 00:49:39.680] who have better versions of this technology,
[00:49:39.680 --> 00:49:42.680] but then also some of these knock on effects,
[00:49:42.680 --> 00:49:44.920] like, you know, anyone can create anything,
[00:49:44.920 --> 00:49:47.000] anyone can write anything.
[00:49:47.000 --> 00:49:49.840] Instantly, like, there's no more barriers to this.
[00:49:49.840 --> 00:49:51.400] What are the knock on knock on effects on this?
[00:49:51.400 --> 00:49:53.240] I don't know, nobody knows.
[00:49:53.240 --> 00:49:54.840] So this thing that worries me a lot,
[00:49:54.840 --> 00:49:56.560] but then the other part of me is like,
[00:49:56.560 --> 00:49:59.680] the alternative is this technology is only controlled
[00:49:59.680 --> 00:50:02.680] by large organizations and they're full of good people,
[00:50:02.680 --> 00:50:04.320] but they do bad things,
[00:50:04.320 --> 00:50:06.680] and they will use it to serve us more ads.
[00:50:06.680 --> 00:50:08.920] Or we could use it to activate humanity's potential
[00:50:08.920 --> 00:50:11.480] by bringing this to the world and having an open debate.
[00:50:11.480 --> 00:50:14.400] And so when challenges come up, we can react to it together.
[00:50:14.960 --> 00:50:16.320] You know?
[00:50:16.320 --> 00:50:18.200] So that's how I try to mitigate against that.
[00:50:18.200 --> 00:50:20.440] That's kind of my approach of philosophy.
[00:50:20.440 --> 00:50:25.520] Who's the more forward thinking in terms of the risks
[00:50:25.520 --> 00:50:27.960] and trade-offs around all of this?
[00:50:27.960 --> 00:50:31.480] Because as you mentioned earlier, ethics, morality, laws,
[00:50:31.480 --> 00:50:34.200] all of those things are very, very different.
[00:50:34.200 --> 00:50:38.640] And what UK versus US, I think I've heard you reference,
[00:50:38.640 --> 00:50:40.440] like, there's no absolute moral framework
[00:50:40.440 --> 00:50:41.480] for things in the world.
[00:50:41.480 --> 00:50:46.480] And like most people that do harm or ill as we view it,
[00:50:46.480 --> 00:50:49.600] they can talk themselves into anything
[00:50:49.600 --> 00:50:51.200] and believe in anything, right?
[00:50:51.200 --> 00:50:54.320] So I guess I'm interested in who's at the forefront
[00:50:54.320 --> 00:50:57.800] of thinking through some of these things
[00:50:57.800 --> 00:51:02.320] and what's gonna be the governing body in your mind
[00:51:02.320 --> 00:51:06.000] that actually comes to these types of decisions?
[00:51:06.000 --> 00:51:07.640] - I've been very disappointed.
[00:51:07.640 --> 00:51:10.160] I've not met anyone who's really thought this through.
[00:51:10.160 --> 00:51:13.760] The classical thing is either massive techno optimism.
[00:51:13.760 --> 00:51:16.680] Everything will be absolutely fine or massive
[00:51:16.680 --> 00:51:17.600] ultra orthodoxy.
[00:51:17.600 --> 00:51:20.480] This technology is too dangerous to ever release.
[00:51:20.480 --> 00:51:23.040] There are very few people who are kind of stringing
[00:51:23.040 --> 00:51:24.200] the line between.
[00:51:24.200 --> 00:51:26.280] I think UK government's probably the most forward thinking
[00:51:26.280 --> 00:51:27.120] on this.
[00:51:27.120 --> 00:51:29.480] The European government is the most regressive.
[00:51:29.480 --> 00:51:30.960] They're looking to ban general purpose,
[00:51:30.960 --> 00:51:33.960] artificial intelligence and be the regulatory leaders,
[00:51:33.960 --> 00:51:34.880] which is stupid.
[00:51:34.880 --> 00:51:36.840] Europeans will fall behind.
[00:51:36.840 --> 00:51:39.560] The US is trying to figure out where it stands on this.
[00:51:40.280 --> 00:51:41.480] In terms of governments.
[00:51:41.480 --> 00:51:42.680] Like I said, even on the individuals,
[00:51:42.680 --> 00:51:45.680] I just think that it's really complex.
[00:51:45.680 --> 00:51:47.160] And I can't think of a government structure
[00:51:47.160 --> 00:51:48.000] that can handle this.
[00:51:48.000 --> 00:51:49.040] 'Cause one of the questions is like,
[00:51:49.040 --> 00:51:50.520] do we give this to the Linux Foundation
[00:51:50.520 --> 00:51:52.400] or something like that for the decisions?
[00:51:52.400 --> 00:51:53.880] Not really.
[00:51:53.880 --> 00:51:55.000] These are do no-voe things
[00:51:55.000 --> 00:51:58.200] and you have to kind of make decisions around this.
[00:51:58.200 --> 00:52:01.480] But I think, you know, a lot of the press against us,
[00:52:01.480 --> 00:52:03.400] we had a lot of positive press, a lot of negative press,
[00:52:03.400 --> 00:52:04.240] but at least it's out there.
[00:52:04.240 --> 00:52:05.760] And I like that it's out there
[00:52:05.760 --> 00:52:07.280] because it means that people are having
[00:52:07.280 --> 00:52:08.600] really strong discussions.
[00:52:08.600 --> 00:52:10.600] I think we need to have more structured forums
[00:52:10.600 --> 00:52:12.640] to have these discussions in a proper way,
[00:52:12.640 --> 00:52:14.120] not an emotional way.
[00:52:14.120 --> 00:52:15.360] And I think that will happen,
[00:52:15.360 --> 00:52:17.160] hopefully as these things get more exponential
[00:52:17.160 --> 00:52:18.440] and we will host some of them
[00:52:18.440 --> 00:52:21.520] and we'll invite as many people as we can into this.
[00:52:21.520 --> 00:52:23.280] There's also, like I said, our communities
[00:52:23.280 --> 00:52:25.320] around language, around bio-mail and others
[00:52:25.320 --> 00:52:27.360] we're spinning into independent foundations
[00:52:27.360 --> 00:52:29.000] to handle small parts of this.
[00:52:29.000 --> 00:52:32.400] So that I'm inviting everyone in on that.
[00:52:32.400 --> 00:52:34.240] So we shouldn't be making these decisions
[00:52:34.240 --> 00:52:35.680] and I shouldn't anyone else
[00:52:35.680 --> 00:52:38.560] about what a benchmark model is.
[00:52:38.560 --> 00:52:41.120] But then for the overall guiding thing,
[00:52:41.120 --> 00:52:43.760] yeah, nobody really knows, unfortunately.
[00:52:43.760 --> 00:52:45.520] - The last four months of your life
[00:52:45.520 --> 00:52:47.120] or five months of your life or whatever it's been,
[00:52:47.120 --> 00:52:49.840] I'm sure have been pretty unusual.
[00:52:49.840 --> 00:52:51.880] What's it been like at a personal level?
[00:52:51.880 --> 00:52:56.800] One, to have the largest open source community in history
[00:52:56.800 --> 00:52:59.680] or at least trending in that way, as well as,
[00:52:59.680 --> 00:53:02.280] I mean, now you've become something
[00:53:02.280 --> 00:53:05.760] of more of a public figure than you were in the past.
[00:53:05.760 --> 00:53:08.240] What has this been at a personal level
[00:53:08.240 --> 00:53:10.280] managing those two things?
[00:53:10.280 --> 00:53:12.800] - It's been really tiring, stressful.
[00:53:12.800 --> 00:53:14.440] I mean, look, I never wanted to be a public figure.
[00:53:14.440 --> 00:53:17.160] I've asked Virgin ADHD, I hate public.
[00:53:17.160 --> 00:53:19.520] But like this needed to have a figurehead
[00:53:19.520 --> 00:53:21.200] and someone to lay the blame on.
[00:53:21.200 --> 00:53:23.200] And I also got the positives from that, you know,
[00:53:23.200 --> 00:53:24.680] built a great company.
[00:53:24.680 --> 00:53:26.400] I had lots of failures in the past.
[00:53:26.400 --> 00:53:28.560] I'm just trying to do my best
[00:53:28.560 --> 00:53:31.040] because unfortunately a lot of this stuff keeps centralizing
[00:53:31.040 --> 00:53:33.040] so I keep trying to give away a lot of authority
[00:53:33.040 --> 00:53:34.480] and it comes back to me.
[00:53:34.480 --> 00:53:37.440] And that's really heavy and it's heavy burden to bear.
[00:53:37.440 --> 00:53:39.160] But like I said, there are positives
[00:53:39.160 --> 00:53:40.240] that counterbalance that.
[00:53:40.240 --> 00:53:41.320] I hope I do the right thing.
[00:53:41.320 --> 00:53:43.680] But the amazing thing now is that
[00:53:43.680 --> 00:53:45.920] some of the smartest and best people in the world
[00:53:45.920 --> 00:53:47.720] in various sectors are reaching out to us
[00:53:47.720 --> 00:53:49.240] and joining stability.
[00:53:49.240 --> 00:53:51.000] So I think if we improve as an organization
[00:53:51.000 --> 00:53:53.000] and build a great organization people can be part of,
[00:53:53.000 --> 00:53:55.120] you know, like Google in 2012,
[00:53:55.120 --> 00:53:57.440] then maybe this can be dispersed
[00:53:57.440 --> 00:53:59.080] amongst really intelligent people
[00:53:59.080 --> 00:54:01.280] who've got good hearts working in the right way.
[00:54:01.280 --> 00:54:03.280] I do think we need to be more transparent as well.
[00:54:03.280 --> 00:54:05.720] Like there's a tendency to keep things closed
[00:54:05.720 --> 00:54:08.160] for a variety of reasons.
[00:54:08.160 --> 00:54:09.760] And so I hope we can become a really transparent,
[00:54:09.760 --> 00:54:11.720] great organization full of great people.
[00:54:11.720 --> 00:54:12.960] 'Cause then I can go and finish
[00:54:12.960 --> 00:54:14.920] like my video games and things like that.
[00:54:14.920 --> 00:54:17.400] And then take a back seat.
[00:54:17.400 --> 00:54:21.240] - Who do you turn to for, I'm sure you have tons
[00:54:21.240 --> 00:54:24.080] of people willing to offer advice or opinions.
[00:54:24.080 --> 00:54:26.360] Like what is your, the group of people
[00:54:26.360 --> 00:54:29.640] you keep around the table that keep you level headed
[00:54:29.640 --> 00:54:32.560] and balanced and help you drive towards the right
[00:54:32.560 --> 00:54:35.520] or the star or does it inevitably end up being just
[00:54:35.520 --> 00:54:37.520] your gut at the end of the day
[00:54:37.520 --> 00:54:41.280] in terms of what good actually means in this world
[00:54:41.280 --> 00:54:44.120] of no absolute moral framework around that stuff.
[00:54:44.120 --> 00:54:46.200] - I mean, that's a complex thing, right?
[00:54:46.200 --> 00:54:47.680] Kind of getting into metaphysics and things like that.
[00:54:47.680 --> 00:54:49.240] Now I've got my friends, you know,
[00:54:49.240 --> 00:54:51.600] put a board together because they can give me advice,
[00:54:51.600 --> 00:54:55.360] got some excellent people like Shree and Jim on there
[00:54:55.360 --> 00:54:56.560] after the business side of things.
[00:54:56.560 --> 00:54:58.160] And it's the team as well.
[00:54:58.160 --> 00:55:00.120] Again, I need to communicate better with the team,
[00:55:00.120 --> 00:55:03.280] but they tell me very directly when I'm being stupid
[00:55:03.280 --> 00:55:07.440] or overall, or, you know, sometimes a CEO over heaps things
[00:55:07.440 --> 00:55:09.400] 'cause he gets excited.
[00:55:09.400 --> 00:55:11.160] So there's plenty of checks and balances
[00:55:11.160 --> 00:55:13.760] 'cause also I'm quite an approachable person, I hope.
[00:55:13.760 --> 00:55:18.000] But yeah, like I said, ultimately,
[00:55:18.000 --> 00:55:20.240] founder led companies can only sustain for so long
[00:55:20.240 --> 00:55:21.560] and we're in this transition point now
[00:55:21.560 --> 00:55:23.320] where we're gonna become a process driven company
[00:55:23.320 --> 00:55:25.640] and we have to, not from a business perspective,
[00:55:25.640 --> 00:55:27.440] but also from an important perspective,
[00:55:27.440 --> 00:55:29.240] given the impact of what we're doing
[00:55:29.240 --> 00:55:31.000] and our place in the ecosystem.
[00:55:31.000 --> 00:55:34.400] Like again, build a great company to help a billion people,
[00:55:34.400 --> 00:55:37.520] but at the same time, this is an important point in humanity
[00:55:37.520 --> 00:55:39.120] given this technology, like I said,
[00:55:39.120 --> 00:55:41.040] next year is gonna take off everywhere.
[00:55:41.040 --> 00:55:43.120] Like every single graphic designer
[00:55:43.120 --> 00:55:44.800] and every single person will use stable diffusion
[00:55:44.800 --> 00:55:46.360] in some way or another.
[00:55:46.360 --> 00:55:48.320] Every single person doing their homework
[00:55:48.320 --> 00:55:49.880] will use chat GPT.
[00:55:49.880 --> 00:55:52.280] You know, these are big changes that are coming through.
[00:55:52.280 --> 00:55:56.040] And it's been building, I think you've seen it building,
[00:55:56.040 --> 00:55:58.640] but now it's this breakthrough moment that's occurring.
[00:55:58.640 --> 00:56:02.400] - Why do you pick next year as the time
[00:56:02.400 --> 00:56:03.760] that this is all gonna happen?
[00:56:03.760 --> 00:56:07.000] Is it just extrapolating the exponential growth curves?
[00:56:07.000 --> 00:56:10.440] - Real time stable diffusion, apps like Lenza
[00:56:10.440 --> 00:56:12.360] hitting number one on the App Store,
[00:56:12.360 --> 00:56:15.600] showing the value of creativity, like 500 million run rate.
[00:56:15.600 --> 00:56:18.440] And like I said, the final element to that was chat GPT,
[00:56:18.440 --> 00:56:20.120] whereby every single smart kid I know now
[00:56:20.120 --> 00:56:23.680] is using it to do their homework, at least 80%, right?
[00:56:23.680 --> 00:56:26.160] Like it's good enough, fast enough, cheap enough.
[00:56:26.160 --> 00:56:27.840] And that is the take off point.
[00:56:27.840 --> 00:56:30.320] - Great, well, and my thanks for coming on and doing this.
[00:56:30.320 --> 00:56:31.520] This is super fun.
[00:56:31.520 --> 00:56:33.920] It gives us a lot to think about.
[00:56:33.920 --> 00:56:37.480] So thanks for coming in and answering all those questions.
[00:56:37.480 --> 00:56:38.800] - My pleasure, thanks for having me.
[00:56:38.800 --> 00:56:39.960] Cheers, you take care.
[00:56:39.960 --> 00:56:42.560] (upbeat music)
[00:56:42.560 --> 00:56:47.760] - So that'll do it for the 46th episode of Cartoon Avatars.
[00:56:47.760 --> 00:56:51.640] Thank you to my stock for coming on
[00:56:51.640 --> 00:56:54.160] and having that conversation.
[00:56:54.160 --> 00:56:58.000] And thank you to Andrew, Justin, Jenny, Rashad,
[00:56:58.000 --> 00:57:01.200] and everyone that helped out on this episode.
[00:57:01.200 --> 00:57:05.280] Look forward to hearing, to seeing everyone next week
[00:57:05.280 --> 00:57:06.920] on the 47th episode.
[00:57:06.920 --> 00:57:09.120] We have a guest that I've long admired
[00:57:09.120 --> 00:57:11.800] and someone that I was definitely excited to have on.
[00:57:11.800 --> 00:57:15.200] And so you'll definitely enjoy that one as well.
[00:57:15.200 --> 00:57:16.600] Thanks everyone for listening.
[00:57:16.600 --> 00:57:19.180] (upbeat music)
[00:57:19.180 --> 00:57:21.760] (upbeat music)
[00:57:21.760 --> 00:57:31.760] (upbeat music)
[00:57:31.760 --> 00:57:34.340] (upbeat music)