Epoch After Hours

In our first episode of Epoch After Hours, Ege, Tamay and Jaime dig into what they expect AI to look like by 2030; why economists are underestimating the likelihood of explosive growth; the startling regularity in technological trends like Moore's Law; Moravec’s paradox, and how we might overcome it; and much more!

What is Epoch After Hours?

Epoch AI is a non-profit research institute investigating the future of artificial intelligence. We examine the driving forces behind AI and forecast its economic and societal impact. In this podcast, our team shares insights from our research and discusses the evolving landscape of AI.

Jaime: 00:00

We are saying that by the end of the decade, we're expecting this 10,000 times increase, in compute.

Ege: 00:05

When Moore proposed this law in the 1960s was it really obvious that that line would continue? It's been continuing ever since. The world is so complicated. There are all these different forces at play. So how come all these somehow average out into just Australia?

Tamay: 00:21

We had dinner with a development economist, and we're telling him about, we think it's plausible that we could get a 10 x increase in the growth rate, 30% per year. And then he responded, you mean 30% per decade? No. No. No.

Tamay: 00:35

30% per year. So we've been working on understanding trends in AI, and I'm curious what your kind of vision is for our work about these trends in AI.

Jaime: 00:45

So think about Moore's Law. Moore's Law is, this empirical regularity that was observed about how performance in, semiconductors was increasing year after year. That was hugely influential in determining the direction of technological fields. People were able to plan, various years ahead of this technology's existence because they had this expectation that, GPUs, or CPUs were gonna get, so much better. We're doing the same, but for AI, both on the input side and on the output side.

Jaime: 01:19

We're mapping this trend or the resources that you need to develop these, AI technologies. And also now trying to understand, what this AI is gonna be able to achieve in the future.

Tamay: 01:32

So so I think we've had, quite some success with the work we've done recently. I'm curious to go back and, talk about, you know, how we started Epoch and what we were thinking, when we, you know, started the organization. On the one hand, you had, you know, people very carefully studying what seemed to me fairly parochial questions in academia. On the other hand, you had people thinking about the very big picture, but doing it very not rigorously, and very, you know, by by analogy, thinking about evolution, or thinking about the human brain, or just like what is possible in theory. Maybe doing like a bunch of advanced math and trying to figure stuff out from there.

Tamay: 02:11

But I totally kind of agree that the thing that, I was really excited about was trying to do this very rigorously and carefully, and actually looking at the existing research and try to see if we can extract information from there or build our own frameworks for kind of carefully thinking through these results. And, you know, some of the scaling law work I see as being really quite instrumental in, you know, building up this framework that that we currently have for thinking about kind of advanced AI and development of AI.

Jaime: 02:44

So I think we should talk about what we mean by by scaling laws. Maybe, Ege, you wanna take this one?

Ege: 02:51

Sure. I mean, even going back to what you said, I think the idea people had about AI, maybe with a few exceptions, was heavily focused on software and coming up with better algorithms, better methods to try to match or exceed the performance of human brains. But if you make your focus algorithms, then it's actually quite difficult to, like, argue that that's gonna be a thing that's gonna scale predictably or smoothly, or you're going to be able to say something about what AI is going to be capable of in 15 or 20 years. You would go and ask people, okay. So when are we going to get computer systems that can match the performance of humans in vision?

Ege: 03:38

When will computer systems be able to look at the scene and pick out what are the objects in this scene? And people didn't really have a good methodology for answering this question.

Tamay: 03:48

There were surveys and people asked researchers how much progress has been made and extrapolated that, but I think that felt really unsatisfactory. Yes.

Ege: 03:57

Yes. And, the precursor for this work was done by, I guess, maybe Moravec and Kurzweil. They were not, like, rigorous by our current standards at all. I wanna emphasize this. It was a very preliminary kind of work, but they were the only people I think who in the second half of the twentieth century, let's say, were on the ball on this.

Ege: 04:21

And they said, it's not really so much about algorithms. It's about how much resources you have that are you're putting into the algorithms. And the 2 important resources here are for the first one is how much computational power you're throwing at the problem. And the second one is how much data you're throwing at the problem. And initially they were more focused on the computational power side.

Ege: 04:46

They pointed out and Moravec has some early estimates where he tries to look into how much computation the human visual cortex is doing. And the numbers he comes up with are so much larger than the computers which people had access to in the nineteen seventies or eighties. So he made the point, well, well, it's not really that surprising that our performance is extremely underwhelming compared to what humans are capable of. We think a lot of what we do is very easy because it's just so intuitive as we just do it like that. But, actually the brain is a very sophisticated advanced computer and it hides a lot of this complexity from us.

Ege: 05:26

And Moravec predicted based on Moore's Law that you mentioned earlier, he just drew that line and he said, when are we going to get to a point when consumer hardware can actually match the amount of performance in the human visual cortex. He was just looking at the computational side of things. And he came up with a date of between 2010 and 2020.

Jaime: 05:48

Again, a straight line. Like, it feels like this is the the recipe for success. You figure out, like, what is the straight line that matters and you collect the data and just, like, extrapolate.

Ege: 05:58

I think the important thing here is that it is really kind of surprising coming from a prior point of view that the world turns out to be so regular that it is actually well described by straight lines. That's not at all obvious. Yeah. And, like, when Moore proposed this law in the 1960s was it really obvious that that line would continue at least until 2006 and, depending on how you adjust it after that, it's been continuing ever since. Right? that's actually quite a surprising thing. And I think people find it counterintuitive. The world is so complicated. There are all these, you know, different forces at play. So how come, like, all of these somehow average out into just a straight line?

Ege: 06:39

I don't really have a good answer to this question, but the fact that it seems to happen so often and it happens also in AI is really what gives us most of our traction. And I think when people were thinking of thinking about AI in informal terms, well, you can't draw straight lines when you're thinking informally. You have to think quantitatively.

Jaime: 07:00

What are the lines when we think about it? what's your bet for? what are the most important lines that we ought to be tracking?

Tamay: 07:06

I mean, I think the the the one that we've been tracking from the start is really training compute, and training compute has been, hugely influential for shaping the capabilities and both the depth and breadth of the capabilities of AI systems. You know, we have some work where we try to decompose the sources of progress and the capabilities of LLMs, of vision models. And for those, we find that the majority, you know, over 50% of the variation in performance is due to pure scaling pre training compute. And then the other components are scaling and improving the datasets, improving architectures, improving algorithms, the implementations of those algorithms on hardware. So I think the most important for a long time has been pre training compute.

Tamay: 07:57

Now I think the other trend that we've been tracking, actually we've been tracking this for at least 2 years now, is inference compute. There are scaling laws for the gains from inference compute, which started out with the Andy Jones paper on the game of hex. And, you know, once we saw that paper, we got pretty excited about the, about the idea of scaling inference compute, not just for game playing systems, but more broadly. And we've been thinking about that, as well as an important driver.

Jaime: 08:28

Alright. So artificial intelligence is achieving more and more in recent years. We now have, AI that you can talk to, that can solve math problems for you, that can code, for you. I think what we need to talk about is, what has been driving this? What are the reasons why, artificial intelligence has been getting better?

Jaime: 08:49

What what are your the main highlights?

Tamay: 08:52

So I think the key drivers have been the scaling up of compute. You know, mostly in pre training, but increasingly also for running inference.

Tamay: 09:02

So, you know, we did this kind of inaugural project for Epoch where we tried to study, the scaling of pre training compute. And there we saw that since the start of the field of AI about, you know, 1950 to roughly the kind of, inception of the deep learning paradigm, we saw a scaling up of the pre training compute of the compute used to train these systems that at a rate that, is roughly in line with Moore's Law. So doubling every 18 months to 24 months or something. And since then, we saw an acceleration. So since the inception of deep learning, we saw this acceleration of training compute increasing roughly 4 or 5 times per year.

Tamay: 09:46

And this we have found in a kind of later work on language models and in vision models has explained perhaps the majority of the performance improvements in both the kind of breadth of capabilities and depth of capabilities for vision and language models. The other important components have been algorithmic progress. So, these are innovations in architecture, the development of the transformer, for example, ideas about how to scale your compute, your model.

Jaime: 10:18

Could you include data quality of some algorithmic improvements?

Tamay: 10:21

Yep. So data quality has been a huge driver precisely how to quantify this or precisely how to classify this is a little unclear because I think in large part this has been kind of ideas about how to generate, filter, what techniques you apply and bring to bear to producing a really great training data set, that you can think of as algorithmic, kind of innovations.

Jaime: 10:43

Yeah. I feel like I've been there with you here. Like, people have been talking for a bit about, like, oh, the 3 important things to think about when you think about AI is, like, compute data and algorithm. But I think it's right. And, I think that, what we have been able to do at Epoch is, like, actually, we would obviously show to which degree, this is correct.

Tamay: 11:03

And you can go even more deeply into this breakdown and and decompose pre training compute into better chips, more spending on bigger clusters, longer training runs, and we've been kind of delving into this decomposition even more finely.

Jaime: 11:19

Okay. So there's this four factor, right, that is pretraining compute, inference compute, data, and algorithms. If I ask you to do it right now a split of which percentage of success of AI in recent years is due to, each of these components, how would you describe it?

Tamay: 11:37

Yeah. So I think, you know, pre training compute is probably close to 50% or something. And then the algorithms and and data quality has been the the the remaining 50%. You can kind of break that down even more. I think in particular data quality has been, perhaps a really big part of, everything that's not explained by scaling up, the compute and the size of models and so on.

Tamay: 12:02

So data quality has been very important. And the post training kind of enhancement. So fine tuning, supervised fine tuning on instruction data, or fine tuning and post training on a very high quality kind of coding or math problem examples, and making the model really great at solving problems of the sort that end users face when they want to use the model. I think that's been a pretty important component too.

Jaime: 12:31

Yeah. And chain of thought, which links into this notion of inference compute. If you let the model think for longer, they are able to solve more problems. This is said. Now it feels like almost intuitive, but it feels like we really were ahead of the curve in, like, having this notion internalized at the book.

Tamay: 12:47

Yeah. I think in our inference training compute trade off, report that we wrote, we made this claim that it seems likely to us that you can scale up the inference compute of large language models. And by the way, this was before people, were were rigorously doing this. I think this was round about the time that the train of thought paper came out. We said that it seemed plausible or likely to us that you can scale up inference compute by 2 orders of magnitude and get the equivalent performance gain of scaling up your pre training compute by some say one order of magnitude.

Tamay: 13:22

And I think, say the release of o1 has really validated that that point.

Jaime: 13:29

That's right.

Ege: 13:30

I guess one thing another thing I would point out is that the basic picture of compute data and algorithms. I think if you took that picture back to the nineteen eighties or nineties, I don't think people would have really disagreed with you. And I think the important, surprise maybe is that the older parts of this equation, meaning most importantly, the algorithms part, I think that that was the part people really had serious doubts about. Also, basically, turns out to grow, smoothly and relatively predictably. This, is not just the case in AI.

Ege: 14:07

I think it's a broader fact.

Tamay: 14:09

Just to clarify, you mean algorithms are getting better at, like, a very predictable rate?

Ege: 14:15

Yes. You can imagine a world in which for 10 years, there's a very little lab work in progress. And then in 1 year, some team or some lab makes a critical discovery and then suddenly models that are 100 times more efficient. But actually the picture we see more is that, algorithm progress is very similar kind of pattern to Moore's Law, where every year we see, say, halving a little bit more than halving of compute requirements to reach a fixed level of performance. And I think if you pointed this out, I think if you told this to people 20, 30 years ago, they would be quite surprised. This is not, I think, a result that they would have expected.

Jaime: 14:53

How convinced are we about the smoothness? I wanna acquire your intuitions here.

Tamay: 14:57

Yeah. I I think, you know, we did some work about understanding the distribution of different innovations and, you know, what is the, how kind of fat tailed is this distribution? And, you know, we did some work on, you know, improvements in performance in say vision models and and other models. And there we find that, you know, there's this power law where, you know, most of your innovations are driven by kind of smaller innovations that are kind of more, more kind of marginal. And then there are occasional, you know, big insights that do push the field forwards.

Tamay: 15:37

I think, you know, averaging over, like, a year's worth of innovation, each year gives you a fairly predictable or fairly regular, gives you a fairly smooth curve. I think that seems basically right to me. So I would expect, you know, within a factor of 2 or something, the slope being broadly accurate of the field if you average overall , what all the top labs are doing, I think this you can get within a factor of 2 of this kind of overall slope.

Jaime: 16:09

Okay. So, I think one thing that, obviously comes to mind here is that how do we reconcile this with the fact that, AI architectures, the ones that we know about that have been conducted at larger scale, for example, the ones from Meta are quite open, the training process, or, like, they're quite similar to, like, what, was being done to train, GPT-2 for example. Like, how do we reconcile this with, us expecting, like, this smooth trend of improvements?

Ege: 16:36

Well, I would say that, architecture is just a very small part of what goes into the even purely in software terms, what goes into a modern system. In fact, the fact that, Meta and also many other people are willing to release the weights of their model and, of course, also the architecture because you need that to be able to interpret the weights. But they're not willing to share any information about, the details of the training process. Things like how did they initialize their weights, which learning rate to case schedule did they use, where did they get their data from? What I mean, they will sometimes say we have trained on 15 trillion tokens.

Ege: 17:12

Okay. What are these tokens? Where did they come from? How do you filter them? They just don't share this information.

Ege: 17:20

And I think really a lot of the gains are hidden in these points. Another one is the developments in the scaling law literature itself, which has shown that previously people were seriously under training their models, which means that they were giving the models far fewer data than the models actually could use to improve their performance. And this was essentially a suboptimal allocation because you can use a greater compute budget to either have a bigger model or have more data. And, these scaling law improvements help us better optimize this allocation. This was another substantial gain in efficiency and none of these are really about the architecture use.

Ege: 18:04

There are some minor improvements in architecture. Maybe you could say that the better embeddings that Meta uses are an improvement. Maybe you could say that, grouped retention is an improvement over the, initial transformer published 2017. But I think these are really fairly minor improvements. I think that most of the gains are coming from the other facts

Tamay: 18:25

I also think there's just a kind of a long tail of very many things. I think people like to, you know, say, when's the next transformer gonna be invented, but I think, this emphasis on just these small big things, these kind of big sexy breakthroughs, kind of belies the fact that many of these improvements are due to just very many small things. That that that would be my guess as to as to how how we we are making so far progress.

Jaime: 18:56

I'm really very excited to track this better. I really think we should just do that.

Ege: 19:00

Oh, yeah. I agree. Tracking this better would be exciting. I do think though that's like if you just compare the architectures, there isn't enough small things that are different to, explain the differences we see. Like, there really has to be something else.

Ege: 19:14

Yeah. A lot of other things, in the background, and I think there are. Yeah.

Jaime: 19:20

So I think, and I believe you two will agree with me that despite all the bottlenecks ahead and the challenges ahead on getting the necessary power, getting the necessary GPUs, that we should be able to keep up with the rate of scaling we have had until the end of the decade. And that's, like, 10000 times more compute than what was used to train, GPT-4 . And that's a massive gap in in performance. Like to to put this into scale, like, also GPT 4 was, 10,000 times more compute than what was used to train GPT-2 . So, that same jump is, what we are by default expecting by the end of the decade.

Tamay: 19:59

How confident are you in this? So, specifically the claim that we'll be able to keep up this 4 x per year that produces 10,000 fold, increase in scale. Are you, you know, 50% sure? Are you more than that, less than that?

Jaime: 20:14

If there is the willingness to do it, I think I'm pretty confident that you can. So, like, the 2 biggest bottlenecks that we found is, can you get enough power, and can you get enough GPUs? And, the power question is an important one, but that people already have identified as an important problem and that they're already working out a solution for. And there is precedent for, a large scale up of, power production in the US and, in other economies. So, I think that people will just rise to a challenge. It might take, a bit of preparation and a lot of money, but, they will be able to do that. With GPUs, it's a bit harder to scale since, like, GPU production is, so concentrated. But, when we did, this investigation, we roughly found out, well, you know, if TSMC keeps, expanding its production at, the rate that they have historically and the rate that they expect to expand in the future that should be enough to meet that demand. Is that right?

Tamay: 21:17

Yeah. That's right. So maybe let's get into the power component first. Yeah. So, you know, where are we?

Tamay: 21:24

How much power do training runs currently use? And at what rate is this growing, and what is determining, how much, we should expect the energy needs to escalate?

Jaime: 21:37

So in order to train models of the class of GPT-4 or Llama 3, you will need between, 15 to 30 megawatts of power. To put this in context, 30 megawatts of power is, not exactly, but roughly equal to consumption of, 30,000 households, in the in in the US. Now, we're saying that, by the end of the decade, we're expecting this 10,000 times increase, in compute. That doesn't match exactly much with an increase in the amount of power that you need for training. There's some mitigating factors.

Jaime: 22:11

Mainly that, we expect, however, to be more power efficient, in the future, and also that, people will train, for longer. So if you train for longer, you need, less immediate intake of energy at a single, point. So if we combine all of that, roughly, the picture that we expect kind of by default, this is like the default extrapolation with, trends continue, is that, we should expect an increase in the power you use for training of, a factor of, 200, 250 or so. And, that's still quite a lot.

Tamay: 22:45

By the end of this decade.

Jaime: 22:46

By the end of the decade, like, a 200 increase. So that puts you in the range you're gonna need for power, let's say, 5 to 15, gigawatts of power which, it's a lot. That's more than the output of, a single typical, nuclear power plant. Though, it is also not a lot in the sense of, if you look at annual production of energy in the US, this will still be only, a small percentage. So now there's the question of, well, will companies rise to the challenge?

Jaime: 23:20

Will they be able to, expand, data center capacity, to the rate that is needed in order to keep up? And, what we did was, doing this very, basic extrapolation of just looking at, both historical rates of, expansion of power in data centers. And also we looked at, the plans of the power providers. And, you know, even just taking the baseline, it looks fairly reasonable to, do training runs of the scale that, you know, will be on trend, by the end of the decade.

Tamay: 23:55

So one constraint that, is quite important here is that if you have, a training run that happens within a single data center or within a single campus of, say, multiple data centers, but geographically, together, then you have to, kind of source that that power from either a single grid or maybe a single power plant. And, you know, as you said, there are a few power plants that are of the scale to be able to power, to be able to give you enough, electricity to run, you know, 10,000,000 GPUs or what have you. So we've been thinking about geographically distributed training, which enables you to tap into the energy infrastructure across the country in the US, which then enables you to , pull together these very many distributed resources and power plants. So can you say more about how that might enable us to source the energy we need?

Jaime: 24:54

That's right. So a 5 gigawatt data center, at my best guess is possible. There's definitely some rumors going around of, people wanting to aim, for that scale, but it is stretching it. There's, no precedent of, facilities, that have, that level of power consumption. I think the biggest smelters on earth, have a a consumption of, 2 gigawatts, or or so.

Jaime: 25:21

Distributed training is this very appealing alternative if they are not able to fulfill the stream of giant, 10 gigabyte scale, data centers. Because then as you're saying okay, I just have many data centers, each of them is drawing power from a different source. Then, that relaxes this constraint of where you get your power significantly and makes it, so much more, affordable and so much more easy, to coordinate. But now, the question is, okay, if you split your training across multiple data centers, can you do this? What are the fundamental reasons you might not be able to do this? I think there's 2 main constraints that you need to consider here, which are, bandwidth and latency. Now bandwidth is not really a constraint in the sense of the solution, if you are lacking bandwidth between the data campuses, it's like you just build more fiber optic. You just, increase the bandwidth. The amount that, right now, people are spending on, connecting data centers, it's a very small fraction of, what you spend on the GPUs and the supporting hardware, itself. I have to put this in context, I believe that the the fiber optic, that goes through the ocean costs a hundred million dollars to set up.

Tamay: 26:41

The idea here is then if you're spending $10 billion or $100 billion on your clusters, then, you know, what is a

Tamay: 26:49

couple $100 million to, be able to, you know, connect these and and actually orchestrate your training run. I think that seems to kinda match my sense that, you know, conditional on actually scaling this so far, the amount of resources that companies would be willing to expend are going to are going to not be kind of a huge challenge or upset to their ambitions.

Jaime: 27:14

That's right. And, then the the this other question is about, when latency is gonna allow you to do the training. Because the training, at least how we do it right now, is kind of like this sequential processing. It's like you process, a batch of data, then you update the model, and then, you use this updated model to process, a new batch of data. But here, we just did the calculation for, okay, what if we have data centers, on different coasts in the US?

Jaime: 27:42

And, it seems like the latencies are are manageable. So, they will then stop scaling at least until the end of the decade on, current trends. But now power is not the only bottleneck. I mean, in a sense, it's like the most malleable bottleneck. So really, I think that, what we need to to to figure out is, will we have enough, GPUs?

Jaime: 28:03

And what's your take here Tamay?

Tamay: 28:06

So, you know, again, if you think that AI is as powerful a force as we think and as valuable economically, then there should be a lot of effort going into relaxing this constraint and producing enough GPUs to power the training runs. So I think TSMC, if they receive enough advanced orders and companies and these hyperscalers are enabled are able to, convince and signal to TSMC that they're actually willing to pay for, large scale-ups of their GPU production. Then TSMC, I expect, will be kind of willing to expand its advanced packaging capacity to be able to turn, the silicon into advanced data center GPUs. They're already kind of expanding this pretty rapidly.

Tamay: 28:57

They're building new fabs for packaging, to support this increased demand. I think there's been some foot dragging by TSMC, where they're not fully convinced about the demand for data center GPUs. I think it's possible that, these hyperscalers are able to come together and actually commit to paying a bunch of money and placate, and give confidence to TSMC that the scale ups that they could invest in are actually worth investing in. And if there's not enough convincing that happens, very soon, then we might not produce enough GPUs.

Jaime: 29:35

Okay.

Tamay: 29:36

So I think the GPU bottleneck is indeed kind of a challenging one. I do expect, from what I'm hearing, the rumors that OpenAI and Microsoft are working together on building these $100,000,000,000 data centers for doing training. That is very much precisely the thing that we would, predict if this trend continues. And there are rumors that people are actually already, planning for precisely this. And so I think this is quite plausible that it would in fact happen, but I do think there's some chance that there isn't enough convincing through this chain of lab hyperscaler to fab, that happens not quite at the right time for this to actually go through.

Jaime: 30:24

Yeah. So I could put this into perspective. Right now, if we put all GPUs together on Earth, what would you be able to train? A model of 10e28 FLOP, or so or something of of that scale. What you need is, that extra order of magnitude, that extra factor of 20, of GPUs to get to the scale, that we need.

Tamay: 30:44

I think it's probably less than a factor of 20.

Tamay: 30:47

So, currently , there might be 4,000,000 H100 equivalents that are scattered between labs and this is the the production that TSMC has been able to achieve over the past couple years. And so there may be a factor of 10 or something that they should scale this up, and, unless you expect that there's going to be this massive consolidation of resources. If there's this massive consolidation, obviously they need to produce less, but I don't expect that to happen. I think there's going to be this this kind of distribution across some of the 3 hyperscalers.

Tamay: 31:28

And then there's also going to be some allocation between inference and training. Okay. I think like an order of magnitude scale up in production of TSMC is the thing that's needed. I think in order to do this, they need to scale up their packaging and, some of the other kind of high bandwidth memory production.

Tamay: 31:47

And, they're as you said, they're on trend with doing that. So if you read their annual reports, they will say that they're scaling up, at a rate of about 30 to 60% per year, which would get get you precisely the kind of order of magnitude increase by end of decade.

Jaime: 32:02

So we're gonna have enough GPUs, we're gonna have enough energy. I guess the other question people ask us is are we gonna have enough data? Are you gonna be able to train that? I have to train the model on this.

Tamay: 32:14

So we've chatted about, power and GPU production. And then there's the data wall that people are, often talking about. There's Sam Altman tweeting tear down this data wall or whatever. We've been doing a bunch of thinking on this question, and we've done some of the early work, that we've been continually updating, to contribute to this picture of where the data is. So Jaime, is this data wall real?

Tamay: 32:45

Are we going to hit this data wall anytime soon?

Jaime: 32:47

So let's go back to the basics. Where are the companies getting the data that they're training on? They're getting it from the Internet. How much data do they need in order to train, things like Llama 3? I think they were training a bit of, like, 10 to 20 trillion words, of content, which is a lot, but this is still only a fraction of what exists, out there in the Internet.

Jaime: 33:12

Now not everything that's out there in the Internet is gonna be good for you. Some of of that content is just, very low quality, and you're not gonna be able to to train on this. Perhaps a way of thinking about, how much more data are we gonna have out there is, primarily, I expect that a lot of this data comes from, efforts like common crawl that go through the Internet, gather, a bunch of data, create an index that's easy to crawl and filter for quality. And common crawl, our best estimate is that it only it only encompasses, a 5th of the Internet or so. So, in theory, if you did more exhaustive crawling, you should be able to get 5 times more data.

Jaime: 33:53

Even if you still apply the the quality filters that you want to make sure that you end up with data that is of high quality. And there is another wrinkle here, which is that at the moment, the state of the art on training is you train once on each data point because you're not data limited. Right? So it's better to just train on a more diverse, dataset. But once you get to the point where you are really struggling to get more data, then what you might do is just, train a few times on each data point, during pre training.

Tamay: 34:26

I think some of the labs are already starting to train multiple epochs, especially on the very high quality stuff. There's been this research on multi epoch training and there the picture is that you can train basically up to something like 10 epochs before you really start facing the diminishing returns of seeing the same data over and over again. Before then, it's as if you have a fresh dataset. So you can multiply by a factor of, I don't know, 4 or something relative to maybe the 2 epoch training that people are doing right now. And that gives you already a factor of 4 of, like, the effective dataset size.

Tamay: 35:03

And then maybe you can train for even another factor of 2 and still get some gains from that compute spend.

Jaime: 35:11

Yeah. And take into account that the amount of compute that you use scales with the square of the amount of data that you have. So you don't need that much more data in order to get to that 10,000 times larger model than, GPT-4 in terms of compute. You just

Jaime: 35:28

need, like, a 100 times more data. You you might be able with text to almost get there, but you might struggle a little.

Jaime: 35:35

So, right now, what we were saying is that you might get 4 or 5 times more data from just scrapping more exhaustively the Internet, and also 4, 5 times more data from training multiple times on each point. But that's like a total 25x increase. And how do we get to the 100x increase? Right now, I think we have 2 main hypotheses of what you do here. One of them is you start training on modalities, other than text, and the other one is, you train on on synthetic, data.

Tamay: 36:08

Yeah. And, you know, multimodal training is just very useful. So you have OpenAI and others exploring multiple modalities for inputs and outputs. And, I think they're finding, a bunch of interesting use cases. I expect that those use cases will be sufficiently valuable to fold in a bunch of image, video, audio into the training data.

Jaime: 36:35

Yeah. Just think about the cloud web browsing stuff. The web browsing stuff right now, one of the reasons why I think it's not very good is because, it doesn't quite parse, the screenshots that it's being given.

Tamay: 36:48

And the the web browsing stuff you're referring to is like taking control over the user's computer

Jaime: 36:53

That's right.

Tamay: 36:53

And then, you know, performing some tasks, like troubleshooting your computer if you have an error message or something like that.

Jaime: 37:01

That's right. So multimodal training, I agree, It's gonna be hugely useful. It's something that the companies will want to do. And I think, it might be reasonable to expect that if you train on other modalities, also your model is gonna get better. Here the evidence is a bit sketchier but I don't know, it seems intuitively reasonable that by watching videos of people dropping objects and such, you might learn about physics and you might learn about how the world works in a way that you cannot from just reading text. I'm not super convinced by that, but, I could say it as something that that ends up working, for this scaling. But then there's the other question here, which is synthetic data.

Tamay: 37:46

Yeah. So synthetic data people are definitely, quite bullish on it, at least for specific capabilities. So you know, getting really good at producing, at writing good code, or solving math problems, or tool use.

Tamay: 38:07

So having the model generate a bunch of instances of good reasoning or good application of various techniques and then training the model on this data set. I think people are using this quite a lot already, and there might be a more general sense in which you can use synthetic data to build these very basic representations just like pretraining, does with the normal data set that you get from from the Internet.

Jaime: 38:41

So here's how I think about synthetic data, and that makes me very bullish on this strategy overall. We know that it's possible to to distill models. It is possible to take a model that is quite large that has been trained on a bunch of data, make it produce data, and then you train a slightly smaller model on the outputs of this model and it learns to at least resemble the outputs of the larger model. And then on the other hand, you have this inferencing scaling where if you let models think for longer, they produce, higher quality stuff.

Jaime: 39:15

So this this immediately creates this feedback loop where you get the models at the frontier, you use them to generate synthetic data, taking as long as they want. You use them to generate a 100 answers for each digital prompt, and then another model shifts to find the highest quality answer that it has produced. Or even better, in something like math, you check is this solution actually correct? And, you you use that higher quality dataset as the basis for your training and this is what allows you to train for longer.

Tamay: 39:53

Yep. There's this general principle that often verification is easier than generation. And this is especially true for math, for programming, where you can do unit tests, and then you can generate a bunch of candidate examples and then use this verification, or you check the quality, and you filter out the low quality stuff and use the very high quality, examples as training data. And so this suggests a way of spending compute and turning compute into data.

Tamay: 40:26

And so if you're going to be data constrained, but you have enough compute, then this suggests that you can balance those by just using your compute and translating that into, more xamples to train on.

Ege: 40:40

That's right.

Jaime: 40:40

We know that this is already happening for post training. Meta was quite transparent. Like we train one version of our model on core data, and then we use it to generate synthetic examples of more core data that we could continue training on. I think here is this gonna be useful also for pretraining? I think that, yes.

Jaime: 41:01

And I kind of suspect that already within OpenAI and these large labs, there's already, a fraction of their compute that's a non negligible fraction being dedicated to generating synthetic data.

Tamay: 41:14

Yeah. I mean synthetic data is already feeding into pretraining data just by virtue of a bunch of a bunch of AIs output ending up on the Internet and they're producing OpenAI is producing 10 or a 100 billion words per year that ends up on the Internet.

Ege: 41:34

I guess I would raise one concern here, which is that, if synthetic data only allows you to match the quality of the existing high quality data distribution, then you have this problem that's generating a lot of synthetic data. Usually training on a single token takes about 3 times as much compute as, generating a single token. But if you want to get high quality synthetic data, you're going to have to generate a lot of tokens for each token that actually ends up in the training dataset. And that's going to mean that, actually, most of your compute is gonna be spent on generating the data and not training on it.

Ege: 42:09

In fact, this is what has happened with past approaches that trained the model basically on synthetic data. Maybe the biggest one is the line of work that came out of AlphaGo

Ege: 42:21

Where it took about a 100 times as much compute to generate the data that the models were later trained. So this might be a problem. But on the other hand, I think there's a reason to expect synthetic data to be much higher quality than data you just find randomly on the internet even if you have filtered that data for quality. And I think one reason is if you think about how humans learn to do tasks, there are some tasks, some things you can learn just by reading or watching other people's text or actions or whatever, but actually for a lot of more complex tasks, for instance, if you want to be a good chess player, if you want to be a Go player, if you want to be a good programmer, you can name a lot of other things of this nature, it's not really enough to watch other people do the task, even if you do that a lot.

Ege: 43:08

But, if you practice yourself, you get a lot more gain out of that data, it seems, than you just get from watching other people. So there's a hope here that synthetic data ends up just per bit or per token or whatever being much more valuable. And therefore, you actually don't have to generate anywhere near as much as we currently do for pretraining. And so we can still make it to this deadline by the end of the decade. We can keep up the scaling, which I think might be difficult if you have to rely on synthetic data that's just about the same quality as the existing native distribution.

Jaime: 43:44

Yeah. I think this makes sense, to me. So this is the reason why, I'm personally bullish about synthetic data, and this is the out. Right?

Jaime: 43:52

If you run out of natural data, this is how I will envision that the companies are going to keep up with scaling the models.

Ege: 44:01

Yeah. Of course. There are even some examples. For instance, DeepMind recently announced this result, that they were able to get very close to gold medal performance, on the IMO and, that was a combination of 2 systems, both of which were trained on synthetic data. There was a model that just solved geometry problems and a model that did formal proofs.

Ege: 44:22

and they don't give a lot of details about their approach, but they say they basically used use something similar to the AlphaGo approach, which means, some pretraining on a human data or existing proofs in the case of the Go approach, before, the first version of AlphaGo, they had pretraining, what we will call pretraining on just human games that got the metal model up to something like, decently strong amateur level, but far below the level of professional players masters. And then starting from that base, you were able to do, self play, generate synthetic data exactly as you said- you have this ability to spend more inference, compute, generate more high quality data and then train on that data and they were able to improve their models performance by using this method and I think they use the exact same approach in formal proofs, which I think it's actually difficult to imagine that line of work succeeding just by pre training.

Ege: 45:20

I think probably that will not have worked even if they had a lot more resources. So synthetic data is really, vital for improving capabilities in some domains. You can get gains from it that you couldn't get from pretraining even if you scaled up by a 100 or a 1000 x.

Tamay: 45:37

So, Ege, you've recently been thinking a lot about whether there are any limits imposed by, say, the latency of, doing training or by GPU failure rates or other things that, become issues as you scale up your cluster, your the number of participating GPUs in training, by an order of magnitude or 2 or more. As, you know, this picture suggests we will if this trend continues. So I wonder if you could say more about what you've been thinking about and what your conclusions of that work have been.

Ege: 46:11

Yeah. Sure. So the basic way I would try to explain this limit to someone who's just not familiar with any of the details is, currently training runs, of frontier models take about 3 to 4 months. We think, as Jaime has explained before, that this might increase, but, not by a huge amount. So maybe it will go up to 6 months or maybe speculatively up to a year, but probably not much more than that.

Tamay: 46:39

Can you say more, about why?

Ege: 46:40

Sure. So one thing is if you want to make it to the end of the decade, then we don't have that much time. So that's one, you know, like, we can't scale up by that much. But, I don't even expect us to get close to that limit because, first of all, the fact that we have these algorithmic innovations means that if you start a training like, imagine that someone started the training run-in 2019, which was just finishing about now. Even if they had spent a lot of compute, that model would just have been not very useful by now.

Ege: 47:07

It would be way, way below state of the art. So that means that you're incentivized to not really do training runs that are that long.

Tamay: 47:15

It's like this question of how do you optimally, colonize the galaxy. Or, like, you wanna, like, wait on Earth until your technology improves, and your rockets are really fast. But, so you don't wanna launch very early with a very slow rocket because that's gonna be overtaken by a launch, much later with a much faster, rocket.

Ege: 47:37

Yeah, I think that's a very good way of picturing this actually. But whatever the reasons, let's say that there is some kind of time limits, that we have to work with. If you have to do a training run, say, in less than 6 months, then, the problem you face is that as you scale up your training run, each of your so the way training works is that you start from some random points in parameter space, model is basically giving you random outputs and then you gradually adjust the model step by step to better predict your data, let's say, in the case of pretraining and each of those steps you have to do in sequence.

Ege: 48:19

You have to take the first step before you can take the second step. You can't, take them at the same time. You can try to just pack more punch into each step. You can try to go further, but there are limits to how much we can do that. That is controlled by just how how much information you see when you take each step.

Ege: 48:40

But at some point you just have seen enough focus, enough information and just seeing more doesn't get you further. So there's some limit to how useful each step can be, which means you really have to take some number of sequential steps. And this number we think increases as you are training bigger models on more GPUs. The implication is that if your entire time is fixed but you have to take more steps then each step has to take shorter amount of time. And you can imagine that if we, pushed this out by 10000 times, a 100000 times to keep this trend going by the end of the decade, then we're going to really end up with individual steps that take a very short amount of time.

Ege: 49:24

And a lot of things that people today don't think of as problems at all, like, latency of communication between different GPUs. Today, it doesn't matter at all. They're, like, so far from this being an issue, maybe by a factor of a 100 or a 1000. But, well, if you scale by a 10000x, 100000x, these things that people currently ignore can suddenly become issues. And we try to work out, in our recent paper, that I was one of the co authors together with a collaborator.

Ege: 49:53

We try to work out when these limits would kick in if you look at current technology, and also, what kind of improvements can we expect, assuming some reasonable technology improvements by the end of the decade. We end up with the bottom line that, I think there is no obstacle to keeping the scaling until the end of the decade. But if you try to go further, then I think we start having problems on the current trajectory. Maybe we can do, over gpt4 and we can get something like, 10,000 or a 100,000 scale up.

Ege: 50:25

Probably 10,000 is more realistic before we start to see substantial decline in the efficiency with which we use GPUs. What happens is GPUs currently spend most of our time just doing the computations that are useful for your train run. But once you get to the scale, other things like time taken for communications start dominating. It just becomes harder and harder to coordinate these things because each step just has to be done in such a short amount of time. And, at that point, the efficiency with the show using your GPU start declining.

Ege: 50:58

But I don't think there's an issue to make it to the end of the decade. And also, I think even after that, there are methods to do what I said before, which is to pack more punch into each step so that you get more value out of it so you have to do less steps. One of the approaches that I think historically has not been promising is, that people have talked a lot in the past about the way we train these big models is, we just do a very local computation of which direction should I go in to better, fit my data. And then you just take a step and another step, another step.

Ege: 51:36

And people have said, well, the amount of information we're using in these steps is really pretty limited, we can be a lot more sophisticated, and in fact, when people do simpler optimization problems, ironically, they actually use more complicated optimization methods than what we use in deep learning.

Tamay: 51:53

With second derivatives or something like this.

Ege: 51:55

Exactly. But the reason these methods have not worked, it's not that they, don't pack as much punch per step. They do. They're better in that respect.

Ege: 52:03

But the problem is they are much more expensive. And when you, work out, like, how much extra value you get per step versus how much more expensive each step becomes, it's just not worth it to use them. So they have not been competitive, but they would be an option if ever the current, optimizers hit a limit, which maybe after this decade.

Jaime: 52:22

To summarize, and I think what I take away from here is these bottlenecks, this, data movement bottlenecks, are an important problem to solve. But, once you have them on your radar, and I fully expect that the companies, will already be thinking about them, even before we wrote that paper, they will be able to solve them.

Tamay: 52:44

I think yeah. So one thing I really like is that we're solving, we're identifying these novel, kind of issues that have not been on the radar of much of the AI community outside of the labs. Within the labs, I'm sure they have a bunch of research where, we're basically replicating things that they've already known for some time but otherwise, without our efforts, I think some of these ideas would have, taken much longer to be uncovered.

Tamay: 53:12

For these key strategic or conceptual questions and considerations to be uncovered and that's, one thing I'm really excited about with our work.

Jaime: 53:22

Yeah. Me too.

Tamay: 53:23

So, you mentioned latency, which is largely a hardware property. Right? And in terms of solutions, you've been thinking about much of the software side of packing more punch in each step. So you expect the bottleneck, the way that we end up overcoming this bottleneck, if we do, to be more on the software side.

Ege: 53:48

Yes. I think, the reason for this is we have examined the historical improvements in the kind of latency that we would care about. And, it has been fairly slow from the so for instance:

Tamay: 54:03

DRAM access latency has increased by a factor by 30% over the past 2 decades or something.

Ege: 54:10

Yeah. It's very slow. It's the same thing is true for some operations that you need to do during distributor training to make sure all your GPUs know about what each other are doing. And those operations have also been, it's about the communication latency between the GPUs. That's also not been scaling very much.

Ege: 54:29

We've been seeing much more scaling in, flop per second and also in bandwidth, but not so much in latency, which means the when you plot, like, what's been happening to this latency limit, it sort of looks flat. And, however much more powerful than your processors are. If you have this serial, you need to take so many steps bottleneck, then at some point, it doesn't help very much at all. I think another reason to be more optimistic is that, yeah, people aren't currently optimizing this very much, but that's because you might argue that it doesn't matter. So there hasn't been much optimization pressure applied to this problem. Even there, I'm a little bit skeptical for the reason that I agree the latency bottlenecks are basically nonexistent right now at the current scales and training, but that is not really true in inference. So if people were able to reduce latency, that would actually have a meaningful impact on inference economics, which does give a strong incentive to do it. Right now, we estimate that the amount of compute the labs are spending on inference is not very much different from the, same order of magnitude as they're spending on training. So there are big gains to be had there.

Ege: 55:37

But, despite this, the GPU producers seem to be unable to really get these latency numbers down by very much.

Jaime: 55:44

Yeah. I think I find that, quite persuasive. Like, I also think of latency as, like, the reason why you cannot just scale up to infinity at least, with the with the current setup and way that it works. It is still beyond, what is on trend by the end of the by the end of the decade. So you're gonna still get, till the end of the decade.

Jaime: 56:02

But, if you were trying to think of, how do we schedule a training round that uses a million, a million times more compute than what's expected by the end of the decade. You're gonna run into

Tamay: 56:14

The laws of physics will eventually get you.

Ege: 56:16

To begin. Yes.

Tamay: 56:18

So another, constraint that, you know, people sometimes ask is if you have a 10,000,000, a 100,000,000 GPUs in your cluster, these have rates at which they fail. So they fail maybe every 10,000 GPU hours. There's some kind of erosion or some kind of, issue with the hardware, some flaw in manufacturing that results in, the GPU, giving out and shutting down or whatever. So, we've been think recently thinking about whether this is an issue and I was wondering, if you could summarize whether or not this ends up being a a constraint for scaling.

Ege: 57:02

The short answer is that it doesn't end up being a constraint. Even with present methods that people use. So what you have to do to deal with a failure, is, ideally, you need to have an automated process whereby if a particular GPU or sometimes it's the CPU or power supply units or whatever in a particular, machine that has failed, you want to just automatically swap out that machine, replace it with a machine that's functioning properly, and keep your training run going. When you're at really small scales, like, say, below a 1000 GPUs, then, the failure rates are manageable enough that often people didn't even invest in automated recovery from failures, what they would do is when something failed, someone would just be on duty and they would just handle it.

Ege: 57:51

I've heard stories, I'm not gonna say who has told me this, but in some of Google's earlier training runs, they would have, people literally inside this and they're just running around and replacing faulty interconnects that have failed, hey would spot them out by hand. That works when you're at a small enough scale, but at a big enough scale, you really want something automated. You don't wanna deal with this problem.

Ege: 58:13

To give some idea of how frequently these failures happen, the recent llama 3.1 paper from Meta, they they give data about how many failures they experienced throughout their training run. We know how much compute and how many GPUs, what their hardware setup was. So our estimate is something like one failure every 50,000 GPU hours. This means that if you have 50,000 GPUs in your cluster, which is a little bit more than what they have, you're getting one failure per hour. So one day you get 24 failures.

Ege: 58:46

That's actually a lot if you have to recover it by hand. But Meta had an automated recovery mechanism. Their idea was, suppose that some GPU has failed, we need to restart it. If you restart it, we lose the information on that GPU so we just periodically save the information we need to some remote storage.

Ege: 59:04

When a GPU has failed, we just start and recover it and keep our training line going. Right? And you can try to examine when would this become an issue. Well, you can sort of think intuitively that the point at which you start having a problem is when the time between consecutive failures becomes smaller than the amount of time you need to save and recover from this remote storage. But when you work out how much this will be and even for their model, you get to a number like 70,000,000 GPUs.

Jaime: 59:34

Yeah.

Ege: 59:34

So it's a lot. That is already enough to make it to the end of the decade and then there are much more. First of all, the fact that you will be scaling your model up, gives you an advantage. The fact that, there are actually more sophisticated ways you can recover from failures. For example, instead of saving to a remote storage, which is going to have fairly limited bandwidth, you can, back to you do have some number of GPUs in your training run, each GPU can send its state to, say, 3 or 4 other GPUs in the cluster.

Ege: 01:00:09

And then you have, 3 or 4 backups for everything in your cluster when you work up the mathematics of this, you, see that if you have, like, 4 backups of each GPU, in, like, random places in your cluster, then it's extremely unlikely that all of those are gonna fail at the same time. So it will always

Tamay: 01:00:24

correlated than

Ege: 01:00:25

Exactly.

Tamay: 01:00:26

The transfers are extremely low.

Ege: 01:00:27

That's right. So if and that gives you much more bandwidth because, you know, you already need this very high bandwidth interconnect between your, GPUs in your cluster and between nodes to even facilitate your training run setting aside checkpointing. You just need this high bandwidth of communication. Well, if you already have it and it's already there, then, you know, why not use this for checkpointing? It's a very simple idea.

Ege: 01:00:48

And when we try to estimate, okay how big could you could you make training runs with this kind of strategy, you get even with very pessimistic assumptions about how your bandwidth might decline, bandwidth per GPU might decline as you scale your cluster, you still get 10,000,000,000 GPUs or more. So this is not gonna be a problem. I think, in anything else it is, more likely to bottleneck scaling than GPU failures.

Tamay: 01:01:14

Nice. It's gonna take a lot of engineering work too, to orchestrate such a training run.

Jaime: 01:01:19

Yes.

Tamay: 01:01:19

Maybe you'll have robots, instead of humans running around solving interconnects, and there's gonna be progress that we'll have to make, in order to orchestrate training runs of this size.

Ege: 01:01:30

Oh, yeah.

Tamay: 01:01:30

Seems like, there's no kind of in principle reason to expect that

Jaime: 01:01:34

Yeah. The fundamental barrier.

Tamay: 01:01:36

Yeah. There's no fundamental barrier.

Ege: 01:01:37

I think that distinction is very important because, I wouldn't say that, it's just gonna be very easy to scale. You're just gonna have to buy the GPUs. In practice, each order of magnitude of scaling poses a bunch of engineering challenges. But, I think the way we look at this is, imagine that you're spending $10,000,000,000 on the hardware in your cluster, with $10,000,000,000 you can pay for a lot of engineering time.

Ege: 01:02:03

So if it's something you can solve just by having a lot of engineers thinking about the problem, then for us, it's not really a significant bottleneck. We care about things that you cannot just overcome by spending a small fraction of your budgets on engineering or GPU cooling or this or that. These are things that just add small constant factors on top of the cost.

Jaime: 01:02:25

Well that small cost factor might be, a 100,000,000 to be clear. But it's still small compared to the overall cost.

Tamay: 01:02:33

Yes. So our kind of worldview is that AI is going to be so economically valuable and capable of maybe accelerating growth and certainly producing a bunch of output, so that, these kind of costs, while large, end up being fairly small relative to the upside of what AI will likely deliver.

Jaime: 01:02:57

Let's talk more about this. We are talking about if people want to build these models using 10,000 times more compute than GPT 4, we think it's gonna be possible by the end of that decade. But what does this bias? Like, what do we expect to see from AI in coming years?

Ege: 01:03:14

And why would people do it? I think, really, this is the important thing. When you look at what's going to limit scaling, yeah, there are, like, energy and ship production and so on. You can see some limits there, but really the most decisive limit, that limit that could be most decisive is people just, don't wanna spend the money because we are talking about by the end of the decade, if you want to keep this up, on the order of, like, 10 to a $100,000,000,000 being spent on just one training run. And then imagine that across many labs and, imagine all the experimental and inference compute that has to go along with that. You end up with fairly large budgets. So what is this going to get you? Right?

Jaime: 01:03:47

And that's the amortized cost of, like, a single training run? So does the cost of the of the cluster is gonna be in the 100s of billions of dollars? So we expect that artificial intelligence is gonna keep increasing at a very fast pace. The natural question to ask here is why would people pay for it? The scale of the investment that's gonna be needed to build a cluster where you can train a model that is 10,000 times larger than GPT 4 by the end of the decade is gonna be in the order of 100s of billions of dollars. Why would people, do that? What do we expect AI to be able to do, and what impact do we expect it to have in order to justify such a large investment? So I think one fact that's quite useful to keep in mind is just the amount of money that's spent on wages each year. Right? So about wages or human labor is the most valuable factor input in the economy today. About, 60- 70 percent of total income gets spent on the wage bill globally. So this is, like $70,000,000,000,000 a year.

Tamay: 01:04:57

So, if you're able to fully automate human labor, you can capture this enormous sum, and this is a flow rather than kind of stock and so that's extremely valuable if you do a simple discounted cash flow model, of tens of billions of dollars a year, capturing even a fraction of that is worth investing 100, if not 1,000,000,000,000 of dollars, per year on trying to capture.

Ege: 01:05:24

There was a recent comment by Masayoshi Son that, he was being asked about the current rate of investment into AI. And he said, well, you know, if you can automate even some of the economy, then imagine, how much that's worth, like, worth 1,000,000,000 of dollars, and that's per year. As Tamay said, it's a flow. So he was saying, the current rate of investment is even too low. Right?

Ege: 01:05:45

We should expect it to go up over time.

Tamay: 01:05:48

And I think that's basically our view. You know, that Yeah. It's kind of funny that we have this agreement with Masayoshi Son, which otherwise I don't agree much much with. But, we have, we have the same kind of view that the current rate of investment is just much too low relative to the economic value and the potential that AI holds. I think it makes sense to spend a lot on kind of building out our infrastructure for being able to do this kind of training and running inference well in advance of when, AI is able to do full automation or even partial automation.

Tamay: 01:06:25

I think the reason for that is just that it takes time to build up, the relevant capital, the infrastructure, to run and serve models. So things like energy, power plants, fabs, all those take time to build, and investors don't want to be caught out of the blue. That suddenly you have a very capable model and the value of GPUs skyrockets and basically that the infrastructure isn't there to support the serving of this model or the training of this model if we have, innovations in software. And so it'll make sense to kind of well in advance spend much more than we're currently spending in advance of actually doing substantial automation.

Jaime: 01:07:11

So, how far are we from this? Do you think that by the end of the decade, we will already have full automation?

Tamay: 01:07:18

Yeah. I think by the end of decade is, probably too soon.

Tamay: 01:07:23

I think you know, so we're projecting this 10,000 fold scale up, and I think this is like a long way towards, this kind of idea of having a drop in remote worker, which is, you know, this idea that you could have, an AI system take control of your computer or kind of use a computer to accomplish the types of tasks that remote workers would do. So troubleshoot your computer, remotely log in and figure out what's causing this issue and and try to, identify and resolve it or, do a bunch of, other types of work in IT or finance or accounting and other things that can be done remotely. Existing estimates of jobs that could be done remotely are on the order of 50% of the economy. Notably, this has increased since COVID. Like before, there you know, economists thought that, work needs to be done in person, and this is one of the explanations for why there are, cities of the of the scale that we see today that, you know, some work needs to be done remotely.

Tamay: 01:08:31

I think COVID has poked holes in this idea that the majority of work needs to be done remotely.

Tamay: 01:08:41

In person? In person. Yeah. That's right.

Tamay: 01:08:44

So I think close to 50% of things can be, automated with drop in remote workers.

Jaime: 01:08:54

Yeah. A big challenge that I foresee here is that you're gonna need to build infrastructure to be able to accommodate, these workers and it seems that once it became clear that you needed to work remotely during the pandemic, we were very quick in building this infrastructure. So now, the lesson is we will get better at building that infrastructure pretty quick. In the same way that, we improved quite quickly, technologies for video calls and communication with our workers when it was needed. We might be able to find workers workarounds or like, finding ways of integrating AI into the economy in a more seamless, way. One thing that's important to reflect about is how fast and sudden this change can be. I will imagine, even if we have AI that already can solve very complex math problems, that doesn't mean immediately that the world is gonna, speed up by a hundred fold. What do you envision, the process is gonna be from, like, we have this AI that can solve, pretty much any task that a human can do to, this AI, having, a large impact on the economy.

Tamay: 01:10:04

Are are you referring to just cognitive labor or also embodied tasks?

Jaime: 01:10:10

Cognitive labor. Yeah.

Ege: 01:10:13

Yeah. Sure. I think, a simple estimate would be you just get a lot of, AI workers on the tasks that we can currently do remotely, estimate ISO 50%. I think it's a reasonable order of magnitude type estimates. Then you could at least get a lot of value out of those tasks.

Ege: 01:10:36

Insofar as you think, the bottlenecks in the world economy, are not very strong, and maybe insofar as you think, the people who are currently doing those tasks can, over some period of time, acquire new skills and move to the, jobs that AIs are not yet able to do, then we should expect, you know, that could double GDP or global domestic products or more. Right? So just because you automate out of tasks, all those people, most of tasks you have not automated yet, that could already give a doubling and probably more than doubling. But now how quickly does that happen? I think, people disagree about this quite a bit.

Ege: 01:11:15

Even inside Epoch, there are a variety of different views. Some people think, it's plausible that we would get the system that is basically a drop in remote worker. It's as cheap or cheaper than a human. And maybe some people say 10 or 15 years. I think longer than that.

Ege: 01:11:32

So I would not expect or say the next, decade, or but the next 10 years. So not just by the end of this decade to see a very dramatic acceleration of growth. Maybe we would see, a few percentage points increase on the average or something like that. So I'm not very excited. But if you look at the historical trend, growth in advanced economies has been declining.

Ege: 01:11:55

So even stopping that trend and, like, starting to increase that again would actually be a major change. So it'd be a big deal in the world of economics. But as maybe you'll see later, it's actually kind of small compared to the changes we expect to happen later. So. But let's dive a bit more into this because what I think, we believe is that for any single task that we can think of, for any single benchmark, for example, that we can think of, we sort of expect the AI is gonna make, great strides there. We did recently, like, this very hard math benchmark. Even for that one, it seems like the median expectation at Deepak is that by the end of the decade, that will, by AI will already be able to solve, many of these, complex research level questions. But there's this this gap between, that and then this picture bent into, oh there might be, a long tail of tasks or, there might be a long tail of challenges to overcome before it gets distributed in the economy.

Jaime: 01:12:55

What is the distinction? Why?

Tamay: 01:12:58

I mean, I think part of this is just it's been historically very hard to capture, in benchmarks the types of tasks that people are doing in the economy.

Tamay: 01:13:07

And I think producing like, having a benchmark that is able to capture this very well is just extremely hard. And so I think it's natural to think that benchmark progress is going to be just be much faster. Key reasons you might point to is the timescale over which such a task is performed. So often benchmarks, you need to do some reasoning, but you can do that reasoning within, maybe an hour or something for our very hard, frontier math, say, where, you know, it's it humans would take maybe hours to solve.

Tamay: 01:13:39

And so this is longer timescale than many of the traditional benchmarks, but still not the timescale over which key economic tasks are performed, which might might be weeks or months or what have you. And I think that ability for a model to be able to execute plans over very long time scales is a component that I expect to take slightly longer than solving these types of benchmarks, even the ones that are very hard.

Ege: 01:14:06

I would point to a different, different phenomenon, which is that there's this paradox in AI called Moravec's paradox which is the task that a computer or an AI finds difficult are not necessarily the tasks that humans find difficult. And, Moravec's explanation for this was, it's because the tasks humans find difficult are tasks that we have only started doing recently in evolutionary history. And the tasks we find easy are tasks we have maybe not just us, but even lots of other species have done for 100s of millions of years or maybe longer. Those tasks are much more optimized.

Tamay: 01:14:47

So, the first case would be the first class of things, would be things like math and programming and chess or Go. And then the other tasks that we've been doing for a very long time in our evolutionary history is walking and fine motor skills and things like this.

Ege: 01:15:03

That's right. I think to some extent even, for instance, language, is now basically something that AIs can do. And, well, if you look at the evolutionary history of language, it's actually not that old of a scale. Humans are the only read species we know that have language. While if you look at the ability for locomotion and things like that, so many species have that skill.

Ege: 01:15:28

Like, even bees and whatever can do it in their own capacity. And we can't even match that yet. Like, bees have very small brains actually compared to humans. So you might have expected that, yeah, maybe, we haven't yet trained models that could be on par with on par with the human brain. But is it really so hard to be even on par with the brain of a bee? I think it's interesting the extent to which the progress has been so much, more excited towards some tasks.

Ege: 01:15:59

The progress in robotics in any form, not necessarily in a human envelope, but even, something that could control the body of a bee has been very, very slow. I think it has been very disappointing. I think the reason for that is probably this argument that those capabilities are just much more efficiently implemented. And our methods, so right now, are quite sample inefficient.

Ege: 01:16:23

This is a difference that's very easy to see if you train an AI to play Go or chess or whatever. Yeah. They can play it at superhuman level, but they can only do that after playing millions or tens of millions of games. And, well, humans how many games can a human even play in their entire life?

Ege: 01:16:38

It's not that many. And a human can get to a pretty good level of competence on this kind of task even with a few 100 games, which is extremely difficult for AI systems. So there's this big gap in efficiency. We don't know where this comes from. And I think that, these kinds of things where biology is very efficient at doing some things that are very old, are just forming a huge chasm. I can't even think of, really, prominent benchmarks that try to probe the capabilities that are very old in humans.

Tamay: 01:17:12

I think, I kind of agree that, you know, Moravec's paradox applies to things like robotics, and I think that's a very compelling example. But for remote workers, I'm curious what types of tasks or what types of capabilities or competencies are really hard for AI systems to match human performance?

Ege: 01:17:31

Yeah. So right now, I would say, return to the same point of a sample efficiency. So the one way I would express this is it's very normal, when a company hires a new worker right now, especially if it's, like, a fairly junior hire for them to be quite unproductive for the first few months that they're at the company. That's like a essentially like a training period in which they learn the context that they're supposed to be working on. They pick up, some skills that they didn't have before, and they have to do that with, like, a fairly limited number of samples compared to what we show AIs.

Ege: 01:18:02

If you try right now to get an AI to adapt to your specific, context, it's actually fairly challenging. There are 2 approaches right now we have. 1 is fine tuning. The other is sort of in context learning. Fine tuning, you need less data than in free training, but still way, way more than, typically you're going to have access to in this kind of setting.

Ege: 01:18:22

And in context learning is much worse, than we see in humans. For instance, in math problems when we were talking with some mathematicians as part of our benchmarking project, they mentioned that when they try to get current models to solve a math problem, sometimes if the problem is easy, they solve it and that's great. But sometimes they try an approach that doesn't work. And then, the human user says, oh, like, what you did, it doesn't work. Like, try something else.

Ege: 01:18:54

And often they will sort of go through the motions of trying something else, but they will do exactly the same thing. Right? And, why is this going on? Why is it that models are so bad at learning from their failures? I think this is a little bit of a mystery, but I see this as being a very big gap.

Ege: 01:19:10

And I'm not sure right now how it's eventually gonna be covered. My guess is that we are not really going to solve the sample efficiency problem. What we're going to do is we're going to use our ability to overwhelm the problem with greater resources to just scale past it. We're gonna get the superhuman system just like we did in chess or go and other domains, which is still less sample efficient, but because of the overall intensity of the resource inputs we're giving to it, is still more competent. But that does mean that you should expect us to get there longer than you might otherwise expect.

Ege: 01:19:45

Yep.

Jaime: 01:19:46

Okay. So let's take it for granted that we have AI that can do any remote job. Let's even go farther and say that it can do, like, any physical job through, Broward, for example. What the what are the implications of this for the economy?

Tamay: 01:20:00

Yeah. So we've been thinking quite a bit about this and one thing you might do is look at standard accounts of economic growth. And there, there are theories of economic growth, especially endogenous growth theory that place, you know, idea generation and innovation as being very central to economic growth. So, you know, r & d and learning by doing, producing innovation.

Tamay: 01:20:24

And this kind of economic theory tells you that, what is important an important fact about the economy is that there are increasing returns to scale. That is if you double the inputs to your economy, capital and labor are the kind of key inputs, as well as resources, that go into idea generation and innovation, then each doubling produces a greater than doubling of the outputs. Right? So you double your your inputs, your labor on capital, you can basically duplicate this kind of, economic processes that you have set up. That gives you basically a doubling of output.

Tamay: 01:21:00

And then you get this additional boost from innovation making, making these processes more efficient. So getting more output per unit input. And so you have increasing returns to scale and so doublings of your inputs produce greater than doublings of outputs.

Jaime: 01:21:16

Now actually, sorry. I'm confused about the point here. The main mechanism through which, the increasing returns to scale operate is through idea generation?

Tamay: 01:21:26

Well, so an important component is that you get every doubling of the capital and labor, say, you get close to a doubling in output, but that there's this additional oomph on top of that, that takes you to increasing returns. So it's the combination of the kind of returns to scale on the kind of accumulative inputs, which are, you know, capital and, or or these 2 key inputs. And then on top of that, the returns to r & d. So those two things jointly determine the returns of the returns to scale of your economy. And so what this picture says is that if it becomes possible for each of your inputs to be something that you can invest into, so you can turn your output, you know, dollars into more inputs, then you can get accelerating growth because, you kind of double the the inputs, that gives you a greater than doubling of outputs, that you can then reinvest and kick start this accelerating growth regime.

Tamay: 01:22:24

Now, of course, we're not in that regime because, the kind of a central input, a labor, is not something that we can turn money into more labor. This is determined by population growth, which is, somewhat, you know, independent of of how much output, not not quite, but, it isn't determined exclusively by how much we're investing at the very least. And so

Jaime: 01:22:48

And AI changes that.

Tamay: 01:22:50

AI changes that not because it changes human population growth, but it produces human substitutes that, for all intents and purposes are flexible substitutes for human labor. And, once you have that, then you can invest output into building more compute related capital. So fabs, lithography machines, data centers, the energy infrastructure in order to power training and inference and by virtue of that, you can kind of get a dub greater than doubling of output for every doubling of input, and you can reinvest this into the cycle that accelerates growth.

Jaime: 01:23:22

So to paint the picture here, like, we have this AI. This AI can, solve many tasks in the economy and one of the tasks that it can do is, like, it can also help produce more infrastructure, for more AI. It can build help more and more GPUs, which means that then you have, more digital workers, more instances of the AI that can do more and more tasks. And this is, the basic way that has been accumulated.

Jaime: 01:23:47

And, some of these digital workers are, not only producing output, but also thinking about, how to make the whole process more efficient.

Tamay: 01:23:54

That's right. And so sometimes people say, this kind of feedback loop of better technology enabling you to produce better technology has always been around, so AI is no different. But AI is different in a very important, respect, which is that, AI promises to be able to flexibly substitute for human labor such that there is no kind of bottleneck left that we rely on human population growth to try to unblock. And once you have that, then you get this kind of accelerating, growth, pattern.

Jaime: 01:24:27

But then do you get blocked by other factors of protection?

Tamay: 01:24:31

You you do get blocked by those other factors. So those would be, land or total kind of energy that you're able to harvest. Maybe there's some scarce inputs, like, kind of minerals or metals or rare earth metals. I think that is definitely something that eventually binds, but I think there's going to be a period of time when, we have kind of produced the substitute for human labor, but but we haven't yet reached the kind of limits of how much output we can produce.

Jaime: 01:25:06

So the basic picture here is right now, the binding nonaccumulative factor is labor. Once that's lifted, there is gonna be others but until we hit them, we're gonna still have this regime of accelerating growth.

Tamay: 01:25:18

That's right.

Jaime: 01:25:19

Nice.

Ege: 01:25:20

Yes. So one way to maybe try to reason about when this might happen is, when we turn something into an accumulative input and we accumulate tons of it, and inputs, like, complementarity between different inputs is fairly strong. We're going to expect the share of those inputs in total gross domestic product to start to decline. And, in fact, there is maybe some sign of this for labor where the labor share of output has been declining, going up securely quickly. If you imagine that we completely commoditize labor in a like, we have tons of AI workers and they're very cheap, what's going to happen is the valuable things in the economy are gonna become precisely those things we cannot scale, like land and energy and whatever.

Ege: 01:26:06

Those inputs are gonna become very valuable. But you can just look at the path like, the rate at which the labor share has been declining and that is not very fast even though the economy has been growing quite fast, and we have been scaling labor quite a bit. So if you just extrapolate, like, how many more orders of magnitude of increase in labor could we get, before, like, the other factors according to current trends are gonna start to buy some, stop additional growth. There's, a lot more room. I think economists also would agree with this, in the sense that if you ask them, like, are we likely to encounter, like, a decisive natural resource bottleneck or some other bottleneck to economic growth at present rates continuing for, say, 200 years, you're quite unlikely to get answers that are no.

Ege: 01:26:52

No. No. That's gonna happen. We're gonna hit some bottleneck. And all we're saying really is that, AI is going to, speed this process off because it's going to allow us to grow population much more quickly.

Ege: 01:27:03

It's gonna allow everything else to grow much more quickly. And so we get to the point maybe we would get in in 200 years instead in 10 years, 20 years of growth, something like that. And I like, there's really, I think no decisive arguments against this in the literature. Now, of course, impossible conclusion, which I think, makes people skeptical. And that's, I think a totally fine thing.

Ege: 01:27:29

Like, you should be skeptical if someone comes along and tells you something totally unprecedented is going to happen over the next 30 years on its face that's a very implausible claim.

Jaime: 01:27:38

We have been talking about how through this feedback loop of AI automating more of the economy, that, output economic output being used to create more infrastructure for AI and idea creation, how this could result in accelerating economic growth. I want to ask, do you see this as a given? What are the uncertainties that one could have?

Ege: 01:28:04

Yeah. So I guess we don't see it as a given. We have a paper that we published on the subjects where, we basically say, if we define, much faster growth to be around 10 x faster than what is typical in the world economy today.

Tamay: 01:28:20

30%.

Ege: 01:28:21

Yeah. 30% per year. Then, we say, if we get AI that is capable of what saying what you did or what you said, basically. So you're doing what you said, substitute for humans not only on all the remote tasks, but also on all the physical tasks, then maybe it's, like, 50% that we would get, like, a 10 x increase. So we are uncertain if that magnitude is possible or not.

Ege: 01:28:44

So there are arguments yeah. So I recently. Right. So arguments in favor, Tamay already explained the most decisive arguments, which is this idea about, we get, like, much we get this increasing source of scale, which is that, we get more AI workers. We can reinvest, ordinary kinds of physical capital, and we get a lot more researchers, because AIs themselves can do research, which drives growth in, what economists call total factor productivity.

Ege: 01:29:09

Efficiency with the sheer use resources.

Tamay: 01:29:11

Also learning by doing is like an important thing. Yeah. Exactly. Like explicit science?

Ege: 01:29:15

I would say that that yeah. I agree. There are other factors actually which contribute to the same kind of increase in versus scale. You know, just the specialization of human workers is something in practice that contributes to it. Everyone learns a specific thing, that they really are good at and then they trade with each other.

Ege: 01:29:32

And, that also gives rise to some amount of increasing gross scale. But, you know, general ideas this increases from scale. There are some other arguments even if you don't look at this, if you just look at what economists call factor accumulation. So we assume there's no technological progress, beyond what you have already seen. But, we just keep reinvesting outputs into more chips and more robots and, also more factories and whatever.

Ege: 01:30:00

Then the usual opinion in the growth economics profession was that this kind of thing can't really produce long lasting growth. But I think, the reason for that is that we are very used to an economy where labor cannot be accumulated. So, once you remove that and labor itself becomes accumulable, then I think this conclusion that you just cannot get much more growth, by just factory accumulation that is overturned. At least we can get, you know, a 100 x, maybe a 1000 x, I think, on top of what we have now. So that's an additional reason to believe this.

Tamay: 01:30:36

So one intuition for that is, an h 100 GPU costs, like, $30,000. And, according to some estimates, at least, it does as much computation per second as say, the human brain might, although, you know, this is very uncertain. But it costs on the order of, like, $30,000, which is kind of lower than, say, the median wage in the US, especially among, say, like, jobs that we expect, AI systems to be disproportionately doing. So, you know, at at a rate of about a year, it'll be able to, earn enough money to pay for another copy of itself. And so you get very quickly a doubling at the timescale of roughly a year, of just pure factor accumulation.

Tamay: 01:31:20

And so you get this very fast accumulation of these factors, which gives rise to very fast growth.

Ege: 01:31:26

That's right. So, I would say the what is the decisive objection to really both of these views on some level is that, okay, I mean there are some other objections like regulation maybe we're gonna choose to slow down growth because we are worried about the risks or, you know, just generically, we will be more comfortable if this transition happens slower. That's one possibility. But, in terms of the physical possibility of it as opposed to, like, whether we choose to do it or not, I think the decisive objection there is really, about these nonaccumulative factors. We know these factors exist, like land starts to accumulate and physical space, eventually can become bottleneck.

Ege: 01:32:09

Energy is a limited factor that some other natural resources could be limited. And, right now, these things, even if you sum all of them up, they don't, like, account to a big fraction of the world economy. Most of the world economy so first of all, the majority of, it is just labor. And of the remaining parts, most of it is physical capital, both of which we assume will be achievable in this world of AI. So we, try to think about, okay, like, how much room do we have even based on, like, naive calculations to, just scale up GDP?

Ege: 01:32:43

Is it plausible that land or energy or, like, rare minerals or whatever could be a bottleneck to growth? And I think, we basically conclude that eventually, yes. But this is, like, many orders of magnitude away from where we are now. I think, we've had an interesting experience talking to, professional growth economists about this because they tend to agree with the statements if AI is not in the picture. So if you just ask this question, like, can we keep up our just current rate of growth for, like, 200, 300 years?

Ege: 01:33:18

They tend to say, yeah. You know, like, it seems plausible that's that's probably their default expectation. But if we then ask them, okay. Like, suppose we have this, AI systems that can substitute for labor. We have these reasons to expect an acceleration.

Ege: 01:33:32

Then can we get that same amount of, growth but in 20 years? And then I think a lot of them become a lot more skeptical. The reason I think is, it's just intuitively feels very implausible. Like, we're very used to a world of something like 3% per year growth worldwide. This rate of growth has been fairly stable, since the end of the second world war.

Ege: 01:33:54

And, economists are just very used to think of a growth as constant. Even the rest of us sort of take it implicitly. At least if people don't come from a very poor country that happened to experience this very rapid, catch up growth where they went from being very poor to something like advanced economy standards, then maybe they have a slightly different view. But generally, we think of this as being a fairly constant thing. And I think the basic heuristic there that you should be skeptical if someone just comes along and tells you we're gonna see this totally unprecedented acceleration and growth, in the next 20, 30, whatever years that this is plausible, you should be very skeptical.

Ege: 01:34:34

Similar claims have been made about other technology. People have said the same thing about the Internet. People have said the same thing about, I don't know, nuclear reactors, nuclear power. Now some people say the same thing about fusion. And I think we would be skeptical of all of these claims because they don't really like labor is such a huge fraction of the economy.

Ege: 01:34:54

These other things are really not like there's no reason to expect that increasing their supply by a lot would have such a huge impact on the economy. And, we really do think in this case, the, arguments specifically about AI are very strong. Our best theories of economic growth just robustly predict this unless you make very unfavorable and, like, what looks to us like unrealistic assumptions, which is why we think it's, like, there doesn't seem to be an obstacle to it being physically possible. Now, there is, of course, eventually, we think you are going to hit these nonrecuial inputs. And then, we are we're going to have some so first, people are worried about the impact of AI on wages and employment.

Ege: 01:35:40

This is something people worry about a lot. And, there is a common argument that is offered by economists as to why, these worries are unfounded or overblown. And, this has been offered also in past examples of automation is very simple. Humans are always going to have some comparative advantage of the AIs or other kinds of robots on some tasks or the others.

Ege: 01:36:07

So the idea is you will always be able to gain, from trading with them. We will not be worse off by trading with them. And, the way this has manifested in the past is say that cars become much more efficient than the humans. That's transportation. We get cars and there's no need to have horse drivers or horses or, like a lot of people can become unemployed suddenly, or we have spreadsheet software.

Ege: 01:36:31

And then suddenly a lot of people whose job it was to multiply and add numbers in corporations suddenly they're, unemployed. But, of course, unemployment doesn't stay out forever. Those people find some new task to do that is not yet automated. And in fact, the fact that these tasks got automated makes them more productive. So not only do their wages not go down, they in fact go up after they transfer to these new tasks.

Ege: 01:36:55

And I think economists have been looking at AI and just seeing it as more of the same. They're saying, well, it's just another kind of automation. It's just going to go the same way we've seen before. But, I think the important difference is that AI is capable of substituting for humans across all of the tasks and that is very different from doing it only on some tasks.

Ege: 01:37:18

One way to try to see what would happen is imagine that we have AI systems that can run on a GPU using, say, as much power as the humans use to survive. A human user on a 100 watts of power and if it can take that amount of power and it can just deliver, it can just do everything I can do but 10 x better, then my wage will never be above, a tenth of what I need to even just stay alive. Right? So there will be a big collapse in my wage at least relative to the energy I need, and that's wages falling below subsistence.

Ege: 01:37:55

So the reason the arguments of economists about comparative advantage turns out to be not true is precisely because of the nonaccumulative inputs they will point to when they object to the possibility of very fast growth. What happens is that we have so much labor that, the value of labor goes down a lot. This doesn't mean humans as a whole are necessarily poor because humans can just own a lot of capital, a lot of land, a lot of natural resources, and the rents on those things, in this future economy, we expect to go up a lot. So as a whole, humanity, if it keeps its current property possessions is actually gonna be much richer. It's just going to be that labor in particular is gonna become much less valuable.

Jaime: 01:38:44

So the analogy here that this makes me think is aristocracy: in certain countries now, where aristocracy used to own land , they got certain resources, some resources that they have accumulated through generations. And there's some historical wealth, that comes to that. We might end up being like this, in a similar position as the aristocracy of the future that because of our historical position and privileges, we were able to accumulate, a lot of these, capital and resources that keep increasing, increasing and increasing in value. I guess that this also invites, this other possibility, which is that I believe that the trend is, for historical wealth to vanish as some members of the some generations end up gambling away their money or, making some poor bets on how they distribute their money.

Jaime: 01:39:40

And, they tend to revert towards not having, this special, a special place, in the economy. Are you worried that something similar is happening to humanity?

Ege: 01:39:52

I would say it's very reasonable to expect that, because as you said, it's actually the base rate that this kind of wealth that is built up does not last for that long. The wealthiest people in the world today are usually not the people who used to be the wealthiest. They're not the descendants of people who used to be the wealthiest a 100 years ago and it's actually a bit puzzling to explain that why can't they just keep their capital invested and keep getting high returns on it. I think there are a few reasons.

Ege: 01:40:21

So one reason is that, wealth just gets divided as one person is maybe wealthy but they have a bunch of descendants and their wealth just keeps getting divided through the generations. Another reason is, if you just passively own wealth and you invest it, usually the returns you can get are less than if you combine the role of being an investor and also like a manager, a founder of a company and so those people are able to get higher rates of return on their wealth, which means, you might start from a higher base, but you grow, more slowly compared to them.

Ege: 01:40:57

So they have the ability to overtake you and it's definitely something to worry about in the case of an economy where AI just do all of the jobs and humans are on the sidelines just living off the rents from their property. That can last for a fairly long time, but I would expect the AIs to eventually start to dominate the world economy in this sense, at least unless humans are able to take advantage of the larger amounts of technological progress that will be made in this world to increase their own capabilities.

Jaime: 01:41:31

Yeah.

Ege: 01:41:31

So not just remain on the sidelines.

Jaime: 01:41:34

Yeah. I think that this seems pretty determined to me as, economic time goes on, I expect AI to be, a bigger and bigger fraction of the economy and, like, whatever remaining sliver of wealth humanity keeps is gonna be like a smaller fraction. I guess that there is a crucial, difference between, it being a small fraction, but it's still being vastly more than we have now. And, actually, that shrinking to being, 0, leaving us in a worse position in terms of wealth per capita than we have now. Do you have intuitions about, , which of these worlds are we headed towards?

Ege: 01:42:07

I think if you manage to, retain, a peaceful economy in some way and humans are not expropriated then, there's no reason to worry that humans as a whole are going to become much poorer. We own some wealth now. That wealth, I think, is going to have very high, rates of return throughout the period of increased economic growth. So we will end up with a stock of wealth that is just worth an enormous amount by our current standards. I agree with you that over time, as a share of the entire economy's wealth, it is going to shrink.

Ege: 01:42:44

But the overall economy is going to be growing so much. For instance, if you think about the effect I mentioned with founders getting a higher rate of return on their investments, so they are able to do that because they are in the process creating a lot of value that was not there before. So, yeah, they get a bunch of wealth, but it's not because they have taken wealth that belongs to other people, it's because they have created a lot of new wealth and that would be my default expectation, with AI. I also think that it is probably not reasonable to expect that, with the amount of technological progress we can foresee that humans are just going to stay on the sidelines forever, that the human capabilities are also not going to increase. At least, if you consider the amount of wealth that will be available to some humans at least in this world, there will be very strong incentives for the economy to offer products to them that can increase their capabilities, not just for sort of reasons of productivity, but because that might say enable to them to have much richer experiences, it might provide them

Tamay: 01:43:51

Prestige and, you wanna be on the side of the very successful and prestigious types of beings.

Ege: 01:43:56

Yeah. So sometimes people wonder now say what would it be like to see, 4 colors instead of 3. Right? I'm like, right now we can offer this experience to people, but in the future we might be able to. And there's a lot of things like this, which people might well be, willing to pay off for.

Ege: 01:44:12

I think a big one is really the biggest one maybe is increase in life expectancy. Right now, this is just such a scarce good that no matter really how wealthy you are, there is very little you can do to spend your money to extend your life expectancy. But I think in the future, this probably will stop being the case. And if we need to in some ways, change something about humans to make them able to live much longer. Right now, some of the problems is that, parts of your body that's just getting worse over time.

Ege: 01:44:45

It degenerates, and we have no way of dealing with that.

Jaime: 01:44:50

How much are we gonna need to change in order to stay competitive. Because if I'm minding myself, living a 1000000 years and, seeing 4 colors, I don't think that gets me to oh, I'm gonna be able to compete with, an AI that can solve, problem math problems that I cannot solve.

Ege: 01:45:07

Well, we should get back to the scaling laws. Right? Like, we should think that the amount of power or, wealth or whatever that's, or amount of productivity that an agent is going to have in this economy should be roughly proportional to the amount of compute that they are able to use with money. As I think was mentioned earlier, the human brain does about as much computation we think as 1 h 100 GPU it's roughly similar. If you have a total of, like, billions or trillions of these GPUs, then there are some AI systems that run on millions of these GPUs at once.

Ege: 01:45:42

Well, obviously, I think it's very plausible that you're not going to be able to be competitive with those systems.

Tamay: 01:45:48

I think there are a bunch of interesting open questions that I'm excited for us to be working on. So, you know, one is this question of algorithmic progress, and what really drives algorithmic progress? How, is this downstream from, say, scaling, or is this something that we you should think of as being largely independent from from scaling of hardware? Is algorithmic progress mostly about reducing the costs of attaining already kind of achieved levels of capabilities of training smaller models that, are as capable as the large models that we just recently trained? Or are they are these algorithmic innovations around pushing out the boundaries of what our AI systems can do?

Tamay: 01:46:29

Are these gains in algorithmic, are these algorithmic innovations mostly in the form of kind of architectures and data quality or, other minor tweaks? Are they in post training? And if so, exactly what is driving this? I think those are some, questions that I'm excited for us to work on.

Jaime: 01:46:52

Yeah.

Tamay: 01:46:53

Other questions are, how fast are we going to get this drop-in remote workers? I think this is a question that we can now already start to think about. Our Cloud 3.5 has this computer usability that it can kind of, you know, use your computer to accomplish some tasks. This we we are starting to see the kind of kindlings of drop in remote workers, and we can try to study how fast are they improving on the tasks of the type that actual remote workers are performing. And then we can say something about scaling behavior.

Tamay: 01:47:30

You know, suppose we increase the amount of compute by 10000 fold as we expect to happen by the end of decade. Will this be sufficient for automating, 50% of work that, basically, we think the the work that that can be done remotely? What are the what is the chronology of of these of this automation? Which tasks are going to go first? And what is the pace of of this automation?

Tamay: 01:47:57

I think an important question is whether the pace of automation, is faster than the pace of retraining. If that's the case, then I think this has really interesting qualitative, implications around, wage inequality, that there are going to be occupations that earn a lot, because they have yet to be automated and are complemented by things that have been automated. But, humans take too long to enter into those occupations because the timescale training is so long and so you might end up having a bunch of, people who end up not being, retrained to perform tasks and as a result, you get inefficiencies this way.

Jaime: 01:48:43

Nice. So I'd be curious again whether you have takes on either: What are, some key takeaways about AI, that you think are are important or, also what is some future work you're excited about?

Ege: 01:48:57

Well, I mean, the key takeaway, I guess, if someone is totally unfamiliar with AI to come to listen to this conversation, I guess, the one sentence takeaway I would give is the future is gonna be here sooner than you think and, I do think that's a really important thing for people who are not following this, space to understand.

Jaime: 01:49:19

This has to be part of your lateral career plan. Right? Right.

Tamay: 01:49:22

And Ege, you are maybe a bit more skeptical of fast progress than many other people even in our organization. So I think coming from you, this says quite a lot.

Ege: 01:49:39

Yeah. I mean, there's a big difference between people who are, I think, tracking AI progress and people at labs or people who are outside labs, but are sort of in the same social circle or something. They just have much more aggressive views about the rate of progress we're gonna see over the next 50- 100 years, some people in the next 10 years than the general population. I think the people just, do not expect anything like the rate of progress that we expect.

Ege: 01:50:14

Even economists, telling an economist that you expect the rate of growth to plausibly increase by 10 x at least for a while in the next 30, 40 years, that's just like a plausible thing that could happen. They are going to look at you like you're crazy.

Tamay: 01:50:29

We had dinner with a development economist, and we were telling him that we think it's plausible that we could get, a 10 x, increase in the growth rate, so 30% per year and then he responded, you mean 30% per decade? Like, no, no, no, per year.

Tamay: 01:50:48

You mean 3% per year. Right? No, no, no, 30% per year. And, it took a while to get this through to him, but as soon as he actually realized what we were telling him, he just started laughing

Tamay: 01:51:01

And thought this was totally ridiculous. So there is this, response that the economists have, of this view as being, pretty bonkers.

Ege: 01:51:11

Yeah. And I would encourage people to keep the sort of very long run, historical view in mind. We have seen a lot of things in history which are, you could even say more remarkable than what we're talking about here. For instance, maybe read the most unlikely and so we don't really understand, transition that happened is this transition of a biogenesis. Like, we had planets with no life, and then suddenly it has life.

Ege: 01:51:43

And then suddenly you have these bacteria that can, multiply by 2 every 20 minutes. And then they just, colonize the entire planet. It's actually, like we take it for granted. But if you look, inside a cell, it looks so complicated and that just happens. It's just the thing that can happen. And then after that, even in human history, we went from the world population doubling, every 100,000 years in the foraging era, to something like every 1,000 years after agriculture to now where the world economy is doubling every 20 years. So we have seen many orders of magnitude of acceleration and growth before and this is one of the main reasons why when people are skeptical, they say, well, in the last 70 years, growth has been slowing down. How can you look at that and then say such a big acceleration is plausible?

Ege: 01:52:37

I would say, well, you have to take a broader historical view. If growth has been fairly constant for 7 years, maybe even if you know nothing about AI, just come at it from a pure, trend extrapolation perspective, you could maybe say that's gonna last another 70 years, there is a 50% chance that's not in 70 years. And, economists are and I also think just ordinary people are just much more certain than that.

Ege: 01:53:03

That's the business as usual is going to continue. There are some interesting stories of similar views in the past. So for instance, before we entered the industrial era, progress was so slow that usually you didn't know it was happening, but you didn't really see much progress in one lifetime. So people will sort of take it for granted that technological progress and new inventions and things like that, those are pretty rare. And in the 17th century, late 17th, early 18th century, people started to notice that there was some real progress starting to happen in the economy.

Ege: 01:53:40

They couldn't notice some changes in their life and Isaac Newton said that this is very strange. We were suddenly seeing this progress. How could this happen? And his conclusion was that it must have been because civilization was destroyed before and we're just recovering because, he just couldn't imagine otherwise that he couldn't imagine otherwise, he just couldn't reconcile the rate of progress with why if we can progress at this rate, why are we not already extremely advanced? He couldn't explain it. Now we think we have an explanation, this increasing terms of scale, growth stories, but I would just advise people to keep in mind that we have seen a lot of big changes. We're not seeing something that is historically unprecedented in some sense and that I think should make you more skeptical than you would otherwise be. That's really the main high level takeaway that I would want people to have on the details, what specific projects that I think could shed more light on the future trajectory of AI.

Ege: 01:54:46

I think, one concept, aside from, of course, everything that has been mentioned so far that's, seems it could be pretty important is this training inference compute trade off. Yeah. This is important for several reasons. One of them is that it seems it plays a pretty central role in, your story about why synthetic data is such a plausible way to overcome data bottlenecks. And I agree with that view, but, I do think this trade off is just not super well understood.

Ege: 01:55:17

Like, I think there's just a lot more work that we could do here and there's so many implications of this that I think we've realized, like, over the past 6 months that it's really worth, doing some more rigorous research in this area. I'm very excited about the possibility of just getting much better benchmarks that are actually going to be able to at least put up a fight against the enormous rate of progress that we expect. I think people are just not taking this very seriously in a lot of domains. They are just sort of building easy benchmarks that easily get saturated like GPQA.

Ege: 01:56:02

GPQA is this benchmark that was created is a multiple choice benchmark. It was created by getting some experts and non experts and then telling the experts, okay, write a question that, an expert in the same field as you is going to be able to answer with, a very high chance. But a non expert, even with access to a search engine, is not going to be able to answer again with a high chance. But the resulting datasets, in my opinion, was not the kind of things that those experts are actually useful for in the real world. It was just things like, terminological things or formulas that those experts are expected to know, but are not very easy to find by Googling

Ege: 01:56:47

them.

Ege: 01:56:48

But a model can do that, but in no means is even close to substituting for the experts that were actually part of that management. So I think benchmarking efforts have been very poor, and we have, I think, already started to improve upon the state of the arts in a big way in math. And I just think we could do much more of that. I just don't see the community paying too much attention to this problem. It seems like everyone wants these benchmarks exist.

Ege: 01:57:18

When a new benchmark is released, everyone's like, yes, a new benchmark. It's so hard, so nice. But at the same time, when I look at how many people are working on creating such benchmarks, it seemed so few.

Jaime: 01:57:29

Yeah and in a sense, it feels creating this benchmark is the obvious thing, right? If you wanna measure how fast AI is improving, what you need to do is create settings in which you can evaluate this rigorously which is what benchmarks are for.

Ege: 01:57:42

Versus in the case of GPQA, it was marketed as this very difficult benchmark and then 1 year or maybe 1 year after this release, now motors are already at , I don't know, 70%, 80% or something like that. And, so I think benchmarks are not created with the right framing in mind, people are maybe stuck in the way of thinking about AI that was true 10 years ago.

Ege: 01:58:09

They're stuck in like, let's create these, very narrow benchmarks. Let's test if a model can look at a picture and tell a dog apart from a cat. This is the kind of benchmark we used to have. And, that's just so primitive and so easy compared to the what we need to test and probe for the capabilities that are going to lead to the big changes that we've talked about. So, maybe people are just not investing in this because they don't share our view, of how big the impacts are going to be.

Ege: 01:58:41

And in that case, I think we just have this worldview advantage that we can exploit to build these good benchmarks.

Tamay: 01:58:48

Yeah. I think our world view on this is, fairly rich. We've thought about this quite a bit and understand, and I think have a better sense than a lot of people about precisely the way in which this automation might happen and the reason why this automation might result in accelerating technological change and economic change. And I think I want to spend a lot of time just harvesting that world view and picking out things that we can translate into benchmarks, so that we can understand if you scale these resources, like compute data, so on, how much progress are we making on these tasks? And I think that will just inform us about, how fast we're getting to this world that we expect eventually to reach.

Jaime: 01:59:32

Yeah. And I think that this has also put us in a much better position to advance the rest of our research agenda. Algorithmic progress, if we have more data on benchmarks, will be much easier to study then more generally like this question of, as you scale the systems, how much better they get? Also, they are easier to study if you have this information.

Tamay: 01:59:51

And research about predictable evaluation. So, suppose you scale up your model by scaling up your training compute and your data set, and maybe scale up the post training and scale up inference compute. What exactly is the effect on benchmark performance or kind of real world utility? I think we will be able to develop a much more mature sense of this. by investing in evaluations, by investing in good benchmarks.

Tamay: 02:00:20

And I'm really excited to be making progress on that kind of research.

Jaime: 02:00:24

Yeah. This is the goal for us. We want to be that organization that everyone can look up to for understanding how quickly AI is advancing, how quickly investments are growing, and how do you relate this to in order to think about where it might be headed. And we have done, a pretty good job here, but I think we are gonna do, a much better job in the next year.

Ege: 02:00:48

I would agree with that.

Jaime: 02:00:50

Okay. So we've been talking about many things today. We've talked about the mission of Epoch. We've talked about why we created this organization. We've talked about AI, how it has been able to achieve many incredible things in the last decade in no small part due to, the incredible scale up in resources that the field has achieved.

Jaime: 02:01:11

We expect this scale up to continue. We expect these improvements to continue and eventually have AI that can, substitute humans first in remote work and later in practically any kind of economically useful task, that you can think. And, if this world view proves true, we also think that this will have dramatic implications on the economy.

Jaime: 02:01:34

Where the economic growth might drastically accelerate. Now, I think we need to take all these lessons very carefully and prepare for the future and that goes through having accurate, up to date information on how fast are the trends of investment in AI and how fast are the trends of improvement in AI and that's what Epoch AI is gearing up to do and I would love for the people watching here to come with us, in this in this journey, keep up with, the work, that we're doing, because I think that by doing so, you're gonna end up being, much more informed about, AI and the role that it's gonna play in our society.

More episodes

Chapters

What is Epoch After Hours?