Epoch After Hours

Daniel Litt is a professor of mathematics at the University of Toronto. He has been a careful observer of AI’s progress toward accelerating mathematical discovery, sometimes skeptical and sometimes enthusiastic.

Topics we cover: the hardest problems models can solve today, whether there is convincing evidence that AI is speeding up math research, and what’s missing before AI might have a shot at solving Millennium Prize problems.

We also discuss how to measure progress in math, including Epoch AI’s new FrontierMath: Open Problems benchmark which evaluates models on meaningful unsolved math research problems.

What is Epoch After Hours?

Epoch AI is a non-profit research institute investigating the future of artificial intelligence. We examine the driving forces behind AI and forecast its economic and societal impact. In this podcast, our team shares insights from our research and discusses the evolving landscape of AI.

Speaker 1: 00:00

You have a new tool. It's a lot of fun to use. But, you know, I think a lot of people are talking about accelerating science. I'm not necessarily skeptical that that's going to happen in some sense, but I think they're talking about it without trying to rigorously run tests as to how much the tools are are accelerating things. So anytime I think about an open problem, the first thing I do is ask What would models for some ideas.

Speaker 1: 00:19

Yeah. Yeah. They're almost always sort of nonsense. Never gotten an idea that sort of passes the snip test for sort of a deep open problem. It's weird that we haven't seen a lot of mildly interesting conjections resolved by AI.

Speaker 1: 00:29

I guess we're now starting to see those arguably. So I think right now, at least where I expect AI to have a significant impact is that the marginal cost of, like, trying something is getting very small.

Speaker 2: 00:40

Hello, everyone. I'm Greg Burnham. I'm a researcher at Epoch AI, and this is my colleague. I'm Anson. I'm also

Speaker 3: 00:47

a researcher at Epoch AI.

Speaker 2: 00:48

And we're joined today by Daniel Litt. Hi, Daniel.

Speaker 1: 00:50

Hey. Nice to see you guys and meet you in person. I'm Daniel Litt. I'm a a professor at the University of Toronto, in mathematics.

Speaker 2: 00:58

I wanted to start with sort of a a fun one. Could you characterize, like, what's the hardest math problem that AI systems today can solve? However, you might think about that.

Speaker 1: 01:09

That's a good question. Yeah. So I think, oh, okay. So, of course, there are these examples where now every frontier model basically has gotten a gold on the the last IMO. And I think that's a pretty good baseline for what you should expect.

Speaker 1: 01:22

So I think probably later in the discussion, we'll discuss a few open problems that have been resolved either with the help of AI or autonomously by AI. And I think it's probably accurate to say that those are about at the level of some kind of a mid tier, low tier IMO problem. Gotcha. I think there's some evidence that systems can do a little bit better than that right now. Mhmm.

Speaker 1: 01:44

And I think, you know, with some work, one can probably even elicit better performance than that from from current gen models. But I I think that's overall, like, about the level of a contest problem, something you would expect a strong high school student or undergrad to to solve in a in a couple hours seems to be about where the systems are at.

Speaker 2: 02:03

And one thing you've you've, you know, stuck your neck out on is that within the next year, we might see like mildly interesting conjectures resolved. Maybe you can say what that means.

Speaker 1: 02:14

Yeah. So so for me, I think what I mean by a mildly interesting conjecture is something that someone has stated in print. So at least one person was really interested in it. And hopefully, someone has, you know, spent at least a few hours thinking about it. I think there are a good number of of problems at that level.

Speaker 1: 02:29

I I would expect that current gen systems can resolve some of these things with sort of the the, you know, pass it 1,000,000 or something like that.

Speaker 2: 02:36

Gotcha.

Speaker 1: 02:37

I I think arguably some recent examples actually fit that bill. So maybe that that prediction where I stuck my neck out has actually come true. Although, there's some questions about about exactly, you know How interesting. How interesting such problems are. Right.

Speaker 1: 02:52

Yeah.

Speaker 2: 02:52

Very nice. You've actually referred a couple of times already now to sort of time frame for human solve. Yeah. I was gonna get around to that eventually. Is that a metric you like in general?

Speaker 2: 03:05

I'm curious what you think about that.

Speaker 1: 03:07

To be honest, not really. I think that's not a great way to think about difficulty. So right. So an IMO problem, you know, a strong high school student, very strong high school students in the world could have about an hour and a half to solve the problem. That is definitely gives you some upper bound on difficulty.

Speaker 1: 03:25

You know, I I think that for a lot of those problems, if you gave them to a, you know, a professional mathematician, they would actually take longer.

Speaker 2: 03:31

Sure.

Speaker 1: 03:32

And that's just because, you know, they fiddle around. They wouldn't be motivated in On the same way science. As in a in a contest. And also just solving a contest problem, there's a certain very constrained set of techniques you can use. And, you know, in research mathematics, you're not constrained at all.

Speaker 1: 03:49

So you're you you don't just try the tricks and see see what works. You try everything and you fiddle around and you maybe break out a computer and do some, you know, work out some examples or whatever. So, yeah, I I I think difficulty, it's sort of a funny thing. Like, how do you judge it? And a lot of the times, maybe the best way to judge it is after the fact.

Speaker 1: 04:06

So you look at the the structure of the proof and you say, oh, well, actually, this wasn't that hard. Yeah. I think that's a little dangerous. Right? Of when we're talking about, like, evaluating a model, I think that leads to goalpost moving.

Speaker 1: 04:16

Like Sure. The model comes up with a proof and you say, oh, well, that wasn't that hard. You know, I didn't have to do anything. Just pressed a button.

Speaker 3: 04:21

One thing you said often a lot on Twitter is that the amount uplift that you get from these systems is actually quite limited. But then I think there are also some other people who feel like that's the uplift is quite a lot bigger. Mhmm. I'm curious, like, how you would characterize, like, what the difference is. Like, what's going on?

Speaker 1: 04:36

Yeah. I mean, I do think there's some areas where the models are just better. So, you know, if you want our working optimization, for example, my sense is that there's a lot of people, for example, at OpenAI who are experts in that area, they've generated a lot of data. Probably have used their their human expertise to Yeah. Guide the models.

Speaker 1: 04:54

And so I I would not be surprised if people, for example, in that area are getting more out of the models. Comparatively, my sense is that in algebraic geometry, number three of the models are just not that good. I think also there are areas which are more amenable to sort of to the tool use the models have have access to. So, you know, if you're trying to come up with a counterexample to some inequality or whatever Mhmm. You know, writing code is Code is it's very natural Yeah.

Speaker 1: 05:24

Thing to do if you're trying to come up with a con counterexample to a conjecture about the intermediate code being of a cubic threefold, probably there isn't really any code you can Mhmm. Can write.

Speaker 2: 05:32

So, it will help you.

Speaker 3: 05:33

Mhmm. That's interesting.

Speaker 1: 05:35

That said, I I also think probably people are just misreporting the Mhmm. Amount models are helping them. You know, you have a new tool. It's a lot of fun to use. But, you know, I think a lot of people are talking about accelerating science.

Speaker 1: 05:46

I'm not necessarily skeptical that that's going to happen in some sense, but I think they're talking about it without trying to rigorously run tests as to how much the tools are are accelerating things. I I'm I'm very happy to believe that some frictions are being removed, but Sure. There are also a lot of bottlenecks in research that I think the models just don't touch and, you know, so are you really accelerating if you, you know, have removed a bottleneck to, you know, opening a paper and and finding lemma 3.7 Right. Right. But not Mhmm.

Speaker 1: 06:16

To, you know, having a good idea.

Speaker 3: 06:17

Yeah. This is why we're glad to have you tell us what's right and what's wrong. Yeah. One thing I find interesting about that is, the typical way I would characterize differences in capabilities or the spikiness of AI capabilities across domains is through something like Morovac's paradox.

Speaker 1: 06:31

Mhmm. Mhmm.

Speaker 3: 06:32

But then it seems like for things within math, like, also get this kind of spikiness within math.

Speaker 1: 06:37

Mhmm.

Speaker 3: 06:37

And I'm I guess you were kind of alluding to some of this coming from, what training data there is. Some of it is coming from, like, you know, what is more amenable, to the AIs. Do you think, like, that's, like, explaining most of it? Do think there's anything else, like or how we would characterize, like, the relative importance of these factors?

Speaker 1: 06:50

Yeah. I mean, what I what I do think is, doing research math is is a it's a skill that it's it's a little different from context math and that it's a very high dimensional skill. So it's not Mhmm. You know, there's not some more or less finite set of known techniques that are useful. I mean, okay, sometimes context maths requires a little bit more creativity, but okay, to be honest, the models have not really succeeded solving problems where I think one Sure.

Speaker 1: 07:12

Can argue that that that's required.

Speaker 2: 07:14

Yeah.

Speaker 1: 07:16

So, yeah, I think, you know, a lot of the jaggedness we see in math, I think it's just the same jaggedness we see everywhere. I don't think there's anything special. Like, in my view, the the biggest obstacle to the models, like, autonomously doing high quality math research is just the same as the base model to automating anything, which is, like, they can't do sort of long context tasks.

Speaker 3: 07:37

Mhmm. Mhmm.

Speaker 1: 07:38

You know, some of that would take a a human six months. There's just no no task that takes a human six months that the models can do. So, yeah, I think once we start seeing the models performing with software engineering tasks at that at that scale, I would not be surprised if they also start doing high quality math research. Mhmm. So I I just actually don't think there's anything special about math this regard.

Speaker 3: 08:00

Makes sense. Yeah.

Speaker 2: 08:02

Well, one sort of galaxy brained take I have that I probably that I well, hold lightly is that models are weakened spatial reasoning, spatial intuition. Mhmm. I think that's not the galaxy brain part. The galaxy brain part is that that is a big latent factor explaining what they're good and bad at. I do know from hearing narratives of various mathematicians about their thought process that often there is something quite spatial or geometrically intuitive about a reasoning process.

Speaker 2: 08:34

And I do wonder a little if the AI models are especially good when there's a more symbol manipulation approach to solving something, coding being the most obvious, but even just working through algebra. That might be.

Speaker 1: 08:48

I I I to be honest, I'm a little bit skeptical of that kind of explanation. Like, first of all, there's a lot of diversity in how mathematicians Sure. Think about mathematics. So some of, you know, some of us are whatever shape rotators. Yeah.

Speaker 1: 09:00

Some of us are more like

Speaker 2: 09:01

word cells or I was I was curious actually trying to question this belief whether there are what's the word? Aphantasic

Speaker 1: 09:08

There are. Mathematicians. Quite famously. Even geometers who are Fascinating. Yeah.

Speaker 1: 09:13

Right now, there's a huge number of mathematicians working on problems, we have lots of different approaches. And that's, you know, partly why, you know, there are mathematicians who I think are overall, if you tried to put all mathematicians in line, way, way better than me at math.

Speaker 2: 09:25

Yeah. Yeah.

Speaker 1: 09:26

Nonetheless, I'm proving some theorems Sure. Which they they have improved or probably couldn't prove in the same amount of time. It's just because we have a different approach. Sure. So I think also, you know, now there are, what, like three or four models that can, you know, reasonably well solve math problems.

Speaker 1: 09:41

They have slightly different approaches. Sure. A much smaller diversity approaches than humans. Yeah. And I I think we actually see that reflected in the benchmarks.

Speaker 1: 09:49

It seems like the the set of problems that are being solved is like, there's actually quite a lot of commonality between the models Mhmm. Is my understanding from from, you

Speaker 2: 09:57

know Yeah.

Speaker 1: 09:57

Hearing hearing what you guys have been doing That's what our

Speaker 2: 09:59

analysis was.

Speaker 1: 09:59

Yeah. Whereas, you know, of course, all the problems in the benchmarks have been solved by at least one human. Sure. Yep. Yeah.

Speaker 1: 10:07

So I I think yeah. My sense is that the you know, you should think of the model as like a mathematician. And so Yeah. There's certain problems that they'll be quite good at and certain problems that we'll be quite bad at, but maybe we shouldn't read too much into what those problems are. It's just a artifact of the fact that there's, you know, only only two or three models to look at to be in that.

Speaker 2: 10:26

Mhmm. Since you mentioned it, it's a big question that is very hard to answer, but I'm curious. How much transfer do you think we're seeing, in capabilities across different, subfields of math? So if if you get this edge, surely you get an edge from generating synthetic data. Mhmm.

Speaker 2: 10:42

How big an edge do you do as capabilities grow? Do you not really need that? Or do you have any sense

Speaker 1: 10:48

of this? Yeah. It's a good question. I mean, think it's it's sort of hard to say. My my sense is that most of what happens when you try to get a model to to to prove a statement in algebraic geometry is that it tries to find it in literature or like, you know, could find something very close in literature and, you know, just, you know, try to make we get one or two reasoning

Speaker 2: 11:06

steps beyond

Speaker 1: 11:08

that. Like, it's really compared to to what happens when you ask it sort of a convicted heurics question. It's sort of not, at least in my observation, doing the same kind of, like, you know, real attempt to solve a problem.

Speaker 2: 11:20

Interesting.

Speaker 1: 11:22

And I think, you know, compared to a a graduate student, you know, who maybe note like a graduate student who knows all the stuff the model knows Yeah. About algebraic geometry Yeah. Or number theory or sort of fancy math topics would be able to do a lot more reasoning and, like, really try to prove theorems. So it it seems to me like, you know, the models are sort of superhuman in some mathematical Mhmm. Subjects in terms of knowledge.

Speaker 1: 11:46

Mhmm. But they don't somehow the yeah. Whatever reasoning capabilities they have or whatever take like, they haven't necessarily learned the the same techniques that a student who who has that same knowledge base would know. And This is just based on vibes.

Speaker 2: 11:59

Totally, of course. But I don't know. We've chosen you to hear your vibes. What areas do you have the sense are the models are stronger in like native reasoning?

Speaker 1: 12:10

Yeah. I think, I mean, okay. I always say, I don't know about superhuman, but they're definitely super me at like proving an inequality for example, that kind of thing. My my guess is just it's sort of easier to generate data, and there's probably a lot more more data in that area than in algebraic geometry.

Speaker 2: 12:26

And when you say an inequality, are we talking like contest style inequalities?

Speaker 1: 12:31

Yeah. Okay.

Speaker 2: 12:32

Not not like something a little more interesting or important from analysis where everything's inequality.

Speaker 1: 12:36

Yeah. Yeah. I mean, yeah. Something. Yeah.

Speaker 1: 12:38

And I'm yeah. So something where where coding is useful. They're Mhmm. They're very, you know Sure. Typically very strong, you know.

Speaker 1: 12:45

Every once in a while, I'll need to prove any in quality and, like, you know, now my first step is just to sort of explore what the space looks like Gotcha. By by writing some code using using a model.

Speaker 2: 12:55

I think we'll we'll get into this a fair bit more, but you did mention two things I thought were maybe a little intention. One is a thing that the models are missing is having a good idea.

Speaker 1: 13:05

Mhmm.

Speaker 2: 13:05

But another was if they can just do things for six months in many domains, maybe they'll be good in math as well. Say these are intention because I think there are six month projects humans do that don't require having like brilliant flashes of of insight or anything particularly. Planning a wedding or something which models could not do today and takes months. Like, do you think creativity comes just with time? And so so like, what?

Speaker 2: 13:34

I don't know. How how do

Speaker 1: 13:35

you Yeah. That's question. I mean, my my I agree that those two two statements are kind of intention. Mhmm. What I would say is I think there's some kind of like continuum between like just applying a technique and that that's well known and developing a new technique.

Speaker 1: 13:47

Sure. Arguably a new technique is, you know, you you take a 100 different things and put them together in some way. Mhmm. And, yeah, presumably, at least one ingredient to to doing that is is just time. Sure.

Speaker 1: 14:00

Fair enough. It's just not clear to me whether it's the only ingredient. I I would say my experience of doing math is very rarely that I have a brilliant idea that just solves the problem. Of course, sometimes, you know, you wake up in the middle of the night, like

Speaker 2: 14:11

I see.

Speaker 1: 14:12

The problem is solved. But typically, for that to happen, you you have to marinate in the problem Yeah. For four months. So, like, there is some secret ingredient of time. I don't know that my introspection is well developed enough or trustworthy Sure.

Speaker 1: 14:25

To say whether that's the only real ingredient. I mean, I think there are okay. Let me let me amend that slightly. You know, there are other things that happen, like you develop philosophies or or analogies and some kind of mystical aspect to doing mathematics that I think we haven't seen the models do, but it's also sort of BS. Right?

Speaker 1: 14:43

Like, that that that's a that mystical mystical aspect of doing mathematics is maybe just kind of compressing a lot of sort of ideas that you've read or absorbed into some kind of package that's digestible to humans. So I don't know. Maybe it's close to context compaction or something like that. Interesting.

Speaker 2: 14:59

Yeah. I I guess there are these big analogies for like intelligence's search or intelligence's compression or something like that. And we're just better at that for now and models have been getting better at it.

Speaker 1: 15:10

Yeah. I'm generally speaking, I'm also skeptical of those kinds of analogies, but like

Speaker 2: 15:14

Well, say say more.

Speaker 1: 15:15

I I Oh, I just think, you know, my sense is that there are a lot of ways to be good at math. Like like, you know, if you just look at what different people are able to do, there's not that much overlap in capabilities. Mhmm. Like, I don't know that there's any mathematician who, can prove the same theorems I'm proving and there's plenty of mathematicians doing stuff where I think their way of thinking is just quite different from mine.

Speaker 2: 15:36

Oh, but but I I suppose I mean, maybe I've said this poorly, but if I think of maybe it's not falsifiable. If I think of intelligence is search, then what that sounds to me like is you and a different mathematician are pursue pursuing different heuristic search algorithms or something like that.

Speaker 1: 15:52

Sure. I I think that there may be a way of making sense of it, but I would argue that sort of not not very content for. Right? Sure. Like if if intelligence is some hugely high dimensional space and you just make a name for that space.

Speaker 2: 16:04

Doing doing a good job at that. It's yeah. I don't know. I can't tell if it's a lightning or not. Could you contextualize the utility you're finding in the context of previous generations of tools?

Speaker 2: 16:13

Like literature search is better, but Google Scholar existed, you know, and that was presumably an improvement over card catalogs and Right. Conference proceedings.

Speaker 1: 16:22

Yeah. I mean, I I think right now, the tools are on a continuum with with with previous generations. So so maybe in two ways. So first of all, literature search, yeah. Definitely, the models are now, at least for some literature search tests, better than Google or better than Google Scholar.

Speaker 1: 16:37

That probably saves some time. Sure. Is it a lot of time? Well, I don't know. I mean, how much does it how how much time does it save compared to going to the library and

Speaker 2: 16:45

Yeah.

Speaker 1: 16:45

Yeah. Probably couple hours

Speaker 2: 16:47

A bit. Yeah.

Speaker 1: 16:48

You know, every once in a

Speaker 2: 16:48

while. You know, that kind of 2% productivity long run productivity improvement right on trend.

Speaker 1: 16:53

In general, I think, yeah, that those those those improvements are seem to typically be as you're saying like fairly small. Yeah. I would I would be skeptical that this this is more than a percentage or two

Speaker 2: 17:03

for instance. Like like if AI progress stalled today, you wouldn't expect that we already have baked in an explosion in quality of mathematics compared to what was developed ten

Speaker 1: 17:13

year Right. Yeah. I would expect the sort of similar productivity growth to what we've seen, which is, you know, maybe attributable in some to some extent to technology, probably mostly attributable to population growth. You can ask just the the same question about Google, like, much did Google or

Speaker 2: 17:26

email Totally.

Speaker 1: 17:27

That kind of thing

Speaker 2: 17:28

I mean, did did you live through that? Did that feel like, you know I saw

Speaker 1: 17:32

you know, I I was born in 1988, so I became a I got my PhD in 2015. So, yeah, Google was already already around by the time I started thinking about math. So I I have actually asked older mathematicians this question. Yeah. I think the general consensus, just self reported, is that, like, Google did increase mathematical productivity.

Speaker 1: 17:48

But it's pretty hard to see if you just, like, try to look sure. I mean, it's hard hard to come up with a proxy

Speaker 2: 17:52

Right.

Speaker 1: 17:52

Unless you measure this. But I I don't think it's obvious just from vibes, for example, that, like, the advent of Google led to, like, really, you know, a remarkable growth in good new mathematical ideas. I just think literature search is not really where the the main bottleneck is. Makes sense. So there's another precursor to to the development of of these sort of AI tools, which is just the development of computing.

Speaker 1: 18:13

Sure. So we saw a lot of progress in a lot of different areas, like in the, you know, already in the sixties, seventies, with the advent of computers. So for example, maybe a famous example is Euler had this this famous conjecture, the sum of powers conjecture, which is, you know, when is there always a solution to a sum of kth powers being another kth power for some number of kth powers? I don't remember the exact number. So the first counterexample was found just via computer search, and then maybe more famously, the the case of fourth powers was resolved by Elkes in 1988 using sort of very clever computer search.

Speaker 2: 18:48

But but, like, that method would have been dead in the water without computer search. Even though there was a lot of cleverness.

Speaker 1: 18:53

Yeah. That's right. Yeah. So he he was sort of found a way to that to to make these questions accessible to a 1988 computer.

Speaker 2: 18:59

Yeah.

Speaker 1: 18:59

Yeah. I think probably now still they would not be accessible to just naive. Just proof. Yeah.

Speaker 2: 19:04

But no. That's fascinating. But

Speaker 1: 19:05

yeah. So this was a huge development, I think, in a lot of ways. Like, if we just stopped with existing models Yep. We would see some kind of natural continuation of that of that trend.

Speaker 2: 19:18

What what what would that look like? I guess if right now we mostly use them for literature search and coding, maybe it's the coding aspect like alpha evolved style. What do you think would be the development So

Speaker 1: 19:30

what I'm what I'm imagining here is, you know, like, sometimes to make progress in mathematics, have to do a search. Yep. So like this LQ's example for this conjecture boiler. And a lot of the time that search is sort of some kind of art to it. Like, maybe you're working it through a thousand different examples and you don't have an algorithm to work through each example.

Speaker 1: 19:53

So each example requires some little idea or Yeah. Executing some standard technique, but, you know, it's hard to write a computer program to do it. So Yeah. Algebraic geometry, you you need to work through some these parts of it you can automate with a with a, you know, a Python program Yeah. Parts of it that require some real idea.

Speaker 1: 20:08

So I think, you know, probably where the models are now, you could imagine Yeah. Automating with relatively high reliability some of those kind of example searches.

Speaker 2: 20:18

That's cool. I see. So these are cases where it would have had to be much of it would have had to be or or at least the amount of manual work scaled linearly Right. The size of problems.

Speaker 1: 20:29

And now And now some of that work can be

Speaker 2: 20:31

that down.

Speaker 1: 20:32

Yeah. I think that's that's something that I I I am really looking forward to. I think, like Mhmm. You know, sometimes I'll write a paper, which is just like, here's a beautiful construction. Right.

Speaker 1: 20:41

And, like, to get some to get to to find it, you know, I need to do some search and, kind of think about where the right place to look is. And, you know, we already see alpha evolve is maybe some baby version of

Speaker 2: 20:52

this Yes.

Speaker 1: 20:53

Where we kind of see some kind of automated search aided by a sort of clever LLM. I think I think that's that's something where I can imagine that having a really significant impact on mathematics. But I think it would be like sort of the same in the same spirit. Yeah. It wouldn't mean automation of mathematics.

Speaker 1: 21:10

It would be It would

Speaker 2: 21:10

be a continuation of figuring out how to use computers to reduce labor, open up new Right. Yeah.

Speaker 1: 21:15

If you think about, you know, maybe the proof of four color theorem or Kepler's conjecture, that's a similar thing where you've so many cases that one needs to check and you have a computer. You figure out how to get a computer to do that.

Speaker 2: 21:24

If you and I suppose you could get lots more of this if you just way amped up the compute. Like like one question we ask about AI because it's true all the time is what what's compute constrained? Yeah. Do you have a sense of are some areas of math you could really get some fascinating things if you just gave them you know, a thousand times as much compute as they have now. Yes.

Speaker 2: 21:48

Or are these kinda hard

Speaker 1: 21:49

to find the

Speaker 2: 21:50

little ones? I don't know.

Speaker 1: 21:51

Yeah. I mean, I think Corners. A thousand yeah. So so people actually do this. So if you talk to computational number theorists Yeah.

Speaker 1: 21:56

Yeah. I I've had actually really fun conversations where someone takes a problem and say, oh, well, this problem will be solved in this year just by taking Moore's Law Yeah. Yeah. Yeah. Saying, you know, when when will it be no longer be compute constrained.

Speaker 1: 22:08

And the those those predictions are, you know, reasonably accurate. Yeah.

Speaker 3: 22:13

Yeah.

Speaker 1: 22:13

Maybe maybe you know about this example. Can you write an integer as a sum of three cubes? Sure. Yeah. So here we sort of know exactly how hard, at least conjecturally, each instance of this problem is.

Speaker 1: 22:23

So so you fix an integer, try to run it as a a sum of three cubes. We we kind of know about how big to expect those cubes to be. Yeah. I see. But, you know, I think it's cool to do that for numbers where it's hard, but, you know, you you double the computer, triple the computer, or multiple

Speaker 2: 22:41

And you're not otherwise. Yeah.

Speaker 3: 22:42

Speaker 1: 22:42

Yeah. You get a few more Yeah. Get a few more interesting integers, but, yeah, there's a question as to to what extent that constitutes progress in my view. I think there's definitely people who are more excited about it than I am.

Speaker 2: 22:52

Fair enough. That makes sense.

Speaker 1: 22:54

Yeah. I'm I'm imagining things where, you know, you're maybe looking for an example of some construction where there's not just a a known algorithm that that that that computes it. Yeah. Yeah. We instead ask, you know, GBT to to cleverly search and Right.

Speaker 1: 23:12

Right. And and let it come up with ideas.

Speaker 2: 23:15

And if you give it enough test time

Speaker 1: 23:17

Yeah. Maybe maybe eventually it will it will come up with something. I'm just here, I'm not imagining it coming up with better and better ideas as the test time continues, but just trying more and more things. Sure.

Speaker 2: 23:26

And certainly, you had something fair like some way to verify, even if it's just more test Yeah. Compute saying, does this idea look good or something Yeah. Like Okay. And then I did wanna just cover this. Are there problems that AI is causing right now?

Speaker 2: 23:38

I I mean, I think we know like cheating in college is clearly an issue. It's more tempting lower bar to cheating Yeah. Sort of thing. Like what junk papers? I don't know.

Speaker 2: 23:47

What what's the state

Speaker 1: 23:48

of this? So I mean, there's definitely junk papers. So I I think I I started counting in maybe September papers on the archive with the the phrase Hodge conjecture in the title or address of the Hodge conjecture is one of the the six remaining open millennium problems. It's the one that's probably maybe the statement is the hardest to understand for a lay person. So for a long time, was safe from from cranks.

Speaker 1: 24:12

Mhmm. So you just, like, couldn't write anything comprehensible about it. So now that's no longer true. Of course of course, you know, the frontier models can write, you know, reasonable looking Yeah. Text about the Hodge conjecture.

Speaker 1: 24:25

So yeah. So I think in September and October of this year, there were something like 12 or 13 papers posted to archive, math.h math.algebraicgeometry Mhmm. With Hodge conjecture in the title or abstract, and all but one of them were nonsense. Wow. And I've now, of course, I can't prove that they were generated by by an LLM, but based on the writing style, was quite clear.

Speaker 1: 24:44

So to be fair, this was there were maybe, you know, 12 or so papers of this form Right. Right. That came from a smaller the number of authors was not 12. It was maybe like like six

Speaker 2: 24:56

some Oh, I see. Repeat vendors.

Speaker 1: 24:58

Yeah. So I mean, to what extent is this causing a problem? Well, you know, it wasted

Speaker 2: 25:02

Yeah.

Speaker 1: 25:02

Yeah. Several minutes of my time. Right. Right. As the LMs get better at writing coherent looking text, know, before you would, you know, have to spend ten seconds to find nonsense.

Speaker 1: 25:14

Now, you have to spend a few minutes actually. So, like, I think the most serious offender here, the argument really did not make sense, but, you know, I really had to

Speaker 2: 25:23

Yeah. Yeah.

Speaker 1: 25:23

Go to the middle of the paper I see. And see that some some statements were just nonsense. Yeah. The introduction was totally reasonable and Interesting. Know, I'm making some quite dramatic claims which is why I was motivated to

Speaker 2: 25:34

Right. Right. Right.

Speaker 1: 25:34

To check that it was it was BS. But, yeah, it wasn't it wasn't a totally trivial job to do. I I think that problem will get harder. Yeah. So in particular in areas where, like, you know, formalization, like, example, using Lean or other formalization software is not really practical right now.

Speaker 1: 25:51

Think that's gonna be an issue.

Speaker 2: 25:52

That's interesting.

Speaker 1: 25:53

And you can imagine way worse versions of that, which I'm I'm sure are really happening where a serious person, like, maybe a graduate student Yeah. Who's stuck on a problem Uh-huh. You know, uses one of these models to generate a nonsense proof of of a lemma A mistake that they're

Speaker 3: 26:07

on. Yeah.

Speaker 1: 26:08

Where they you know, where it maybe doesn't make sense. And there, you know, 99% of the proof of the paper is probably correct. Yeah. Yeah. That's no value because there's Right.

Speaker 1: 26:15

Some nonsense.

Speaker 2: 26:16

Oh, that's an interesting

Speaker 1: 26:17

case. And that's very hard to to catch. I mean, that of course already happens without You can

Speaker 2: 26:21

without people have been bullshitting since time immemorial. Yeah. But

Speaker 1: 26:24

There's lots of wrong papers out there. But, you know, the a lot of this is about the marginal cost of doing something and the marginal cost of of lying and cheating is getting a lot lower.

Speaker 2: 26:33

Again, if capabilities froze, is this sort of at the level of society manages because society generally, you know, muddles through. Are there any things you'd be particularly worried about?

Speaker 1: 26:44

I mean, I I think it's just contributing already to a like, there's right now in mathematics, think a refereeing crisis Mhmm. There's way more papers that Yeah. That are being generated than can be refereed carefully. And I think it would it would contribute to that. Probably, it would it would continue to to worsen.

Speaker 2: 27:01

Right.

Speaker 1: 27:02

A lot of that is about the incentives in math academia Sure. And not the models themselves. I think we'd manage the way we've been managing, but it's not Imperfectly. Yeah. I mean, I I do think there's some hope here.

Speaker 1: 27:13

Like, of course, the models can also be help with that. Yeah. Yeah. Help check check papers, and and and there's already sort of nice tools that are being developed.

Speaker 3: 27:21

Yeah. So building on top of that, I wanna think of it longer term now.

Speaker 1: 27:24

Great.

Speaker 3: 27:25

A big part of all of this, like, progress in AI and math is about, like, compute and scaling. Mhmm. And, one of the things that we discussed earlier was, being able to, you know, get the AIs to run lots of examples, help you work through lots of examples, and do this kind of stuff. So I'm kinda curious, like, how we should expect, like, the fields of math to evolve as we get to more of this, like, ability to run lots of experiments at scale. So I'm kinda thinking, like, you know, with previous numerical simulations, maybe that enabled, say, lots of simulations in meteorology Mhmm.

Speaker 3: 28:00

And economics. What should I expect in the case of math where we have, like, systems that, say, are able to solve frontier math and we're able to run lots of these models at the same time.

Speaker 1: 28:09

Yeah. Yeah. So I I here, you're you're asking just to make sure I understand, you're asking about, like, the sort of thing I was proposing earlier where you're just sort of working out lots of examples, not, like, trying to get the models to solve through one hypothesis.

Speaker 3: 28:19

Exactly. Yeah. Yeah.

Speaker 1: 28:20

So I think I think we'll see kind of a continuation of previous trends, like, you know, coming with the advent of computers. Like, we'll be able to work through larger volumes of interesting examples. Like, maybe, I don't know, I wanna find an algebraic variety with properties x, y, and z. I can, like, just have the model start trying things like that. The main the main benefit I think you get is, like, the cost of trying, like, the first dumb thing you would think of has has go gets very low.

Speaker 1: 28:45

Like, you know, historically, if I want to, you know, find a construction, I have to sit down and try a few things even if doing that requires very little cleverness. You know, it still takes a few days of my time, maybe. And, you know, I'm a busy guy. Moreover, you know, this might be just an idle question, and, like, I have other things I'm more Sure. About.

Speaker 1: 29:09

So there's some some opportunity cost to doing these things. That opportunity cost and also, like, just the monetary cost of, like, the marginal cost of of trying something will get very low even if you're asking a model that's, like, not that capable to do it. That's Yeah. That's very valuable. So, yeah, I think, you know, one way mathematics moves forward is like, you just like look for examples of cool stuff and then like you just every once in a while, you discover such examples.

Speaker 1: 29:33

You don't that doesn't necessarily require like deep insight or brilliance. It just requires spending some time. And so having automated search for cool stuff Yep. Would be a really big deal. So I'm here, I'm thinking about, like, I don't know.

Speaker 1: 29:48

There there's every there's some kind of sporadic examples of cool things like the sporadic finite simple groups or the exceptional league groups Mhmm. Where, you know, of course, people are searching for them in some kind of fairly principled way. But, you know, ultimately, at some point, you have to make a discovery, and that that a lot of the time, you just make that discovery by, like, oh, noticing someone worked out a cool example,

Speaker 2: 30:11

and then you,

Speaker 1: 30:12

like, observe some some interesting property of that exam. That's that's happened to me. I mean, I think some of the projects I'm most proud of are, like, you know, I noticed something cool in the literature and sort of drew some consequences out

Speaker 2: 30:22

of that. Interesting. So that can be very productive.

Speaker 1: 30:25

Yeah. I think that can be very productive. I think it's a big part of how math moves forward. Like, it's not just that, like, you know, the most brilliant people are proving amazing theorems. Like, there's a huge background of of, you know, people like me who are, you know, maybe more, you know, workman like doing, you know, doing work and and like

Speaker 2: 30:45

Yeah.

Speaker 1: 30:45

Yeah. Just thinking about lots of cool stuff. And every once in a while, they'll they'll find something important. And I think, you know, that's something where just the ability to automate is probably a huge deal even if the automation is not at the level of, like, you know, you know, even a the median professional mathematician.

Speaker 2: 31:03

Sure.

Speaker 1: 31:03

So I yeah. That that's, I think, the primary way I expect near term sort of mathematically capable models to make a big difference.

Speaker 3: 31:11

How quickly did the returns to running these kinds of experiments diminish? Like, you know, I kind of imagine if I had loads and loads of these, like, AIs that are solving like, working through these examples, like, maybe, like, a thousand examples isn't really that much more helpful than a 100. So, like, would it be bounded?

Speaker 1: 31:26

I think it depends a little bit on how you're searching. Right? So the most interesting examples are, like, you at least for me is there's a infinite collection of objects, and you're looking for 26 of them, but they're really very special exceptional things. So, you know, brute you know, it's not like you're doing a brute force.

Speaker 2: 31:44

Sure. Right.

Speaker 1: 31:44

You're doing some cleverness and, you know, the the models can automate a little bit of cleverness. So I think I think that's yeah. I I can imagine situations where being able to search through a million things is way more valuable than being able to search through a 100 things.

Speaker 3: 31:58

Mhmm. And does that imply that, like or which particular fields would be most amenable to this?

Speaker 1: 32:02

I think I think everything.

Speaker 3: 32:03

Mean Everything. Okay.

Speaker 1: 32:04

There's lots of you know, I I guess I'm thinking of algebraic geometry specifically where there's certain sort of exceptional constructions I like and, you know, the the famous example is this 27 lines on a cubic surface and 28 by tangents to a plain quartic and so on. And these these examples have some interesting properties that they turn out to be connected in various ways to the exceptional groups, for example. I mean, this is all known classically in hundreds, but, you know, it would be amazing to find I I I think this is sort of part of a series where we

Speaker 2: 32:29

Mhmm.

Speaker 1: 32:29

There provably cannot be more examples, but it would be amazing to find, like, you know, new cool weird Things like that. Yeah. I think that's and that's not something that that seems impossible to me. You know, people do regularly find sort of cool examples of the same nature, like weird weird phenomena. Yeah.

Speaker 1: 32:45

Mhmm. Yeah.

Speaker 3: 32:46

Yeah. So it sounds like, compute is, like, really important then. Then this sort of raises the question to me of, like, why aren't all the mathematicians going to trying to go into, like, a a big lab and working with, like, you know, OpenAI or DeepMind? Or why aren't they going to places where there's a huge ton of compute and trying to, ride the wave and exploit?

Speaker 1: 33:06

Yeah. I think I mean, to be honest, I think model capabilities are not really there yet. Right? Like, right now, the way you would try to automate this maybe is you just, like, run a model in a loop saying, like, look for a cool example of this of this sort of phenomena. Like, here are the examples you've looked at so far.

Speaker 1: 33:23

You know, when you try to ask the model to do a question like this, it will basically fail a 100% of the time. So, I mean, I think, at least in algebraic geometry, just like the the capabilities are not there yet. Like, they just can't work through an interesting example. Mhmm. So maybe that'll happen soon.

Speaker 1: 33:38

I don't know.

Speaker 3: 33:39

Mhmm. But we also think, like, if we extrapolate the trend on frontier math, then, like, maybe by the end of, like, next year, then it's probably gonna be saturated. So, like, should we be

Speaker 1: 33:46

I I think saturating frontier math is somehow not not relevant to to

Speaker 2: 33:49

So so let let's maybe ask about timelines more just Yeah. On the scales you care about. You you made this bet with former Epochian Oh. Tamay Bessaroglu. Yeah.

Speaker 2: 34:01

I think it was well, we looked this up. So I'm hoping I hope I'm getting this right. Three to one odds that by early twenty thirty, call it four or five years from now Mhmm. Time has progressed. So it's not five it was five years at the time of the bet that the an AI system would not be able to more or less autonomously produce a paper that would be publishable at today's standards in one of the top Yeah.

Speaker 2: 34:26

Annals. The Annals. Top journal Yeah. In your field. And so that was if I if I'm not getting the arithmetic wrong, that's you giving that a 25% chance

Speaker 1: 34:35

That's right.

Speaker 2: 34:35

Of that happening.

Speaker 1: 34:36

Yeah. I think I think there's some chance I was overconfident. I think I probably revised my my estimate more towards Tame's side since since then.

Speaker 2: 34:43

It sounds like I if I recall, Tame, though maybe misreading from your perspective, the the evidence was saying he was less he was more confident. Like, was less confident.

Speaker 1: 34:53

Yeah. I think I think actually so Tamay has moved towards towards me and I've moved towards him since since Beautiful. That's interesting. Yeah. Yeah.

Speaker 1: 35:01

It's also clear to me. Mean, I think right. So this is something like and and There were a few more constraints

Speaker 2: 35:07

Sure. Please. Yeah.

Speaker 1: 35:07

They're interesting. So first of all, it has to be repeatable capability. Although, you know, of course, if the model proves the room on hypothesis probably We're we're

Speaker 2: 35:14

past that. Yeah. Yeah. But Repeatable for the sake of there's not like a quirk of

Speaker 1: 35:18

Yeah. It's not like it just finds one weird example that's a counter example to some major

Speaker 2: 35:22

And and everyone cares just enough. Yeah. The stars align just regularly producing Gotcha.

Speaker 1: 35:27

Stuff. And then there was a cost constraint. Right. The marginal cost of had to be about a $100,000, which was some kind of estimate of the marginal cost of getting a human mathematician to produce an analyst paper. Sure.

Speaker 1: 35:39

So, you know, I've produced an analyst paper. Many other mathematicians produce analyst papers and, you know.

Speaker 2: 35:45

You don't make millions either. So, you know.

Speaker 1: 35:47

Yeah. And I I I think that's a little bit hard to evaluate, like, the sense, like, probably we're not gonna spend a $100,000 Sure. Compute trying to do this. Although, maybe Maybe open the

Speaker 2: 35:58

eye will.

Speaker 1: 35:58

Yeah. But, you know, maybe we expect the cost to drop. So if it's if in 2031, it's doing for a thousand dollars, then then Easy

Speaker 2: 36:05

resolution. Yeah.

Speaker 1: 36:07

Great. Yeah. So I yeah. So sorry. What was

Speaker 2: 36:11

the question? Just sort of well, one one good question here is so first of all, 25 like, just acknowledging. 25% of this happening baseline in five years, like, that's a lot of progress where we are today.

Speaker 1: 36:22

I think I think that's very I I think that's very plausible so as a forecast.

Speaker 2: 36:26

I mean, we could do a lot with that alone. Yeah. Ask about what does math look like that. But I'm curious your your updates since then, like, why Yeah.

Speaker 1: 36:32

Why you might have moved in correct way. Think 25% to me feels a little bit low based on anything besides vibes. I I actually think it's reasonably likely that that that models will be able to autonomously produce, you know, very high quality research in that time. I think in the so that that was supposed to be a proxy for will a bottles of absolute advantage over humans in doing research. And I think I've come to the view that it is a very poor proxy.

Speaker 2: 36:57

I see. Interesting.

Speaker 1: 36:59

And so one reason is the following. So right now, there are mathematicians who I think are much more capable of, you know, their the marginal cost of having them do a produce an analyst paper is much lower than the marginal cost Sure. Of me doing it. You know, they're doing one a year. You know, I do it.

Speaker 1: 37:12

I've I've produced one analyst paper so far and then a few other papers in comparable Yeah. Venues. So yeah. So, nonetheless, like, you know, they're not like, they don't have absolute advantage over me in some sense. Right?

Speaker 2: 37:27

Yeah. Yeah.

Speaker 1: 37:27

We have different points of view, and and and we're doing mathematics in different ways. So, like, there areas where I have advantage or or or, you know, there are areas where I have a substantial advantage in improving an interesting theorem or, you know, understanding interesting mathematics, which I I view as the actual goal.

Speaker 2: 37:42

So so if I hear you correctly, it's like if models have five years from now the same characteristic they sort of seem to have today Right. They're all kind of doing the same thing. It's like a mathematician. That's

Speaker 1: 37:53

right. Yeah. So I think if you imagine you can absolutely imagine models that are very strong at certain, you know, even doing certain types of mathematics Yeah. Or proving theorems of a certain shape or even proving theorems of many, many different shapes.

Speaker 2: 38:06

Yeah.

Speaker 1: 38:06

But you you would still maybe expect math you know, the fact that that Maxime Kantsevich is around doesn't mean that there's no role for other mathematicians.

Speaker 2: 38:13

The cost criteria is relevant there

Speaker 1: 38:15

Mhmm.

Speaker 2: 38:15

A little because let's just imagine that the cost was zero Yeah. For this and that, you know, we weren't at 200 tokens a second which feels like a bizarre coincidence that we are sort of close to human speed of thought. Right. For models, I think of computer they're much faster at us than us at arithmetic. So Yeah.

Speaker 2: 38:34

So putting those aside which are totally real practical

Speaker 1: 38:37

imagine we can instantly get some some analyst papers.

Speaker 2: 38:40

And I guess my question is just well, more almost from the human analogy though, tell me if you think this isn't fruitful. If you put your own brain in a data center and Mhmm. Could you know run a 100 subjective years in a minute. Yeah. Like can you give every mathematician in the world a run for their money?

Speaker 2: 38:59

Like, maybe it takes you longer to do the sorts of things you're not so good at actually as a human, but that's a high bound twenty twenty five human perspective. What is the electronic version? That's a good question. I think it's hard to say.

Speaker 1: 39:11

So right now, because I'm limited in various ways, I mostly work in areas where I'm very good. Yes. And so, know, every once in while that will involve, like, learning a new technique and or very frequently, it will involve learning a new technique and and or, you know, developing a new technique, which is something I think the models have not yet really shown the ability of doing, which takes a lot of effort. I think if I was not so constrained, like, I could start trying to learn other people's techniques Mhmm. More more easily and, you know.

Speaker 1: 39:36

So it's that's a good question. But, clearly, the answer depends a little bit on my capability of doing some kind of continual learning. Sure. So I think when we're comparing to a model, maybe it depends a little bit on on how how things develop in that direction. That said, my expectation is that, no, I I I would probably do a lot of cool mathematics, I hope.

Speaker 1: 39:55

Yeah. It'd be very embarrassing if I was given so much resources and was not able to do it. But, no, I think I think that it's quite clear that other mathematicians would still be very valuable. And there's certain sort of, I would say, styles of thinking or moments of thought that I that in within mathematics that I do not feel suited for. Like Yep.

Speaker 1: 40:12

Maybe that's some some, you know, kind of comparative advantage calculation.

Speaker 2: 40:16

Yeah. I mean, this whole question,

Speaker 1: 40:18

just to

Speaker 2: 40:18

summarize it, is does AI break down comparative advantage dynamics that we're so familiar with?

Speaker 1: 40:23

Yeah. And I I think I don't see a reason to think that. I I think it's, you know I mean, of course, we also asked the question without resource

Speaker 2: 40:29

Yes. Which is, of course, somehow the key Critical.

Speaker 1: 40:31

Of analyzing comparative Comparative advantage.

Speaker 2: 40:33

That that that's right.

Speaker 1: 40:33

A little hard to answer.

Speaker 2: 40:34

So this gives us a sense of what you sort of think five years from now might look like Yeah. Where there's the math that the AI just happens not to be good at, humans gravitate toward that, and then a few humans are guiding the math the math that the AI is good at and

Speaker 1: 40:46

building

Speaker 2: 40:47

Speaker 1: 40:47

Yeah. And I I imagine one would also try to automate systems and, you know, try to develop the some kind of taste evaluation for the AI and try to get it to do interesting things on its own too. So I think that's a possibility, you know, five years. Yeah. Okay.

Speaker 1: 41:00

Maybe 25% is still around where I'm at. Sure. I would go a little higher. I think it's very likely, you know, within fifteen or twenty years that that will have, you know, there will be a lot of areas of mathematics where where automated systems have have advantage advantages over humans. Would I would be surprised if Yeah.

Speaker 1: 41:16

It doesn't happen. And

Speaker 2: 41:18

this is a little our our area, so I don't mean to unfairly ask you to do our job for us. But what is your when you're saying five, fifteen, 20, like what's what are you using to draw the trend line? Like, what's Vibes is a totally finance.

Speaker 1: 41:32

Primarily Vibes. I mean, I I do think I'm I'm participating in the age old tradition of of AI forecasting where five years means you think it will happen soon and more than five years means anything from, you know, five years to

Speaker 2: 41:46

To never.

Speaker 3: 41:47

Yeah. Yeah.

Speaker 1: 41:48

So I think that's, you

Speaker 2: 41:49

know Okay.

Speaker 1: 41:49

Fair fair. A lot of people forecasting AI progress in the sixties and seventies. So I think we're saying A hundred to twenty years in the same Yes. Same way.

Speaker 2: 41:57

Yeah. Whatever. There's sort of today, there's soonish probably maybe, I don't know. And there's like, oh, come on.

Speaker 1: 42:01

Who knows? And then yeah. What I'm what I'm saying is that I would I would be I think it's a reason the 25% Yeah. Reasonably likely, but far from certain that that if current trends continue, we'll, you know, we'll see kind of high quality automated research.

Speaker 3: 42:15

So maybe we should, see what happens when we go actually fifteen or twenty years or perhaps even more, into the future. So one thing I'm especially curious to get your take on is, like, the returns to scaling. And there's kinda, like, multiple dimensions that we could do this. So one dimension is kinda like what we were discussing of, like, thinking longer. Yeah.

Speaker 3: 42:32

Another one is, like, having many more researchers.

Speaker 1: 42:36

Mhmm.

Speaker 3: 42:36

And the last one is kinda like the scaling to intelligence. So maybe let's, start off with the first one. It's like, how do we compare, like, the scaling of, like like, maybe one person thinking for a hundred years as opposed to a 100 people thinking for one year? And, like, how much is there a way to characterize, like, how much better the person gets over the course of this, like, hundred years?

Speaker 1: 42:55

That's a good question. I mean, so maybe let let me kind of connect this to something we were discussing earlier, which is, like, how to decide how hard a problem is.

Speaker 2: 43:02

Mhmm.

Speaker 1: 43:02

So I think there are a lot of problems which clearly have the flavor of, like, if you had a Manhattan project to solve them, they would be solved pretty fast. Or, you know, like, if you like, the primary thing missing is attention. Yep. And if you somehow even got just a few math some of them even if you just got a few mathematicians to really devote your Yep. Time onto them, like, it's almost certain that would be solved.

Speaker 1: 43:20

Okay. I think Mhmm. If you want me to name such problems, I probably can.

Speaker 2: 43:24

Sure. I mean, I'm curious.

Speaker 1: 43:25

Or So I I have in mind, you know, for example, we've discussed privately that the inverse Galois problem for the last simple group Uh-huh. So well known, the Mathieu group m 23. I think that's probably a hard problem. A bunch of people have thought about it. I absolutely think that if, you know, if we had the US government devote, you know, whatever the the the US government spends a $100,000,000 on on mathematics annually historically, although maybe less this year.

Speaker 1: 43:50

We'll see that if they devoted a substantial portion of that to solving that problem, it would get solved. Yeah. Think that yeah.

Speaker 2: 43:57

That's a good idea. That's helpful. Yeah.

Speaker 1: 43:58

And that yeah. So that's maybe talking about just devoting some kind of resources broadly speaking. So if you if you ask a single person to think for a long time, I think there are also a lot of problems where you sort of expect a solution. So maybe as an example, there's a problem I started thinking about in maybe 2016, like it sort of came up in in a talk I saw. And I really liked it, and I tried to solve it, and I didn't really get anywhere at the time.

Speaker 1: 44:25

But I kind of kept it in the back of my head and then, you know, solved it in 2024. And so I think Mhmm. You know, you learn stuff. You connect it to to the sort of things you're thinking about. You know, there's just some problems, like, matching your knowledge to conjecture, like, knowing a conjecture even exists.

Speaker 1: 44:40

And, you know, so as as you kind of just as time passes and you just try various things, sometimes you eventually hit on a solution. And here, in fact, you know, part of the issue was, like, one of the ingredients of the solution just did not exist until 2023.

Speaker 2: 44:54

Oh, that's amazing.

Speaker 1: 44:55

You have to sometimes you just have to wait for someone else to do some work and you realize it's relevant and

Speaker 2: 44:59

then you That that makes the individual mathematician thinking actually sound a little bit more like a group of mathematicians Yeah. Because someone else does another piece. Yeah.

Speaker 1: 45:07

Yeah. So right. So some some of it is just time passing. You learn things. You try different things.

Speaker 1: 45:11

Become a better mathematician. Various ways you learn techniques. Also, you know, you a big part of mathematics, and I think this is something where humans still have a very substantial edge for the for the models, is, like, you have a new object and you try to make friends with it. Like, you you play around with it, work out various, know, ask, you know, questions about it, work out some examples and relevant special cases of the thing you're interested in. And I think this is something where, you know, you you learn about an object that way and develop some theory theory for it.

Speaker 1: 45:41

And this is something where I really worry about model capabilities in the current paradigm and, like, trying to understand sort of marginal cost of progress.

Speaker 3: 45:53

Mhmm.

Speaker 1: 45:54

Right? Like, right now for a model to learn learn about learn about a new subject or like to have a tool that has learned about a new subject, you have to train a new model.

Speaker 3: 46:01

Yeah.

Speaker 1: 46:02

Right? Like, don't know. So Peter Schultz is developing this theory of Gestalten, which is some kind of new new kind of space. And none of the models know about it. Yeah.

Speaker 1: 46:10

Yeah. You know, it's in some I guess they can go online and look on the paper, but they're, you know, they don't they don't

Speaker 2: 46:14

Skimming at level.

Speaker 1: 46:15

Yeah. They can't they can't, like, take the objects and, like, try to work with it Yeah. Yeah. And and make friends with it. And, yeah.

Speaker 1: 46:21

So this is something that a human can do over time, and I think that's somehow the main way that that time helps in solving a problem is you kind of become friends with the objects Mhmm. And develop some intuitions, and the models just can't do that without training an entirely new model. So they're the Yeah. Marginal cost of solving a problem with a that involves some kind of new object or new way of thinking is the cost of training a model. Yeah.

Speaker 1: 46:42

It's huge. Like, way way worse than asking a human to to try to understand the the object.

Speaker 2: 46:46

So continual learning seems

Speaker 1: 46:47

Yeah. Seems very very important.

Speaker 2: 46:49

Given for this.

Speaker 1: 46:49

Yeah. And it's not just I mean, it's somehow not just continual learning in the sense that I can read a paper and understand the object, but, like it. You know, to drive these different builds up inside. If you expect to, you know, this the the models to behave like a person, it's sort of, yeah, playing playing. And I guess we have seen some some attempts maybe at getting models to do some sort of self play with mathematics, but not clear to me what what kind of success that was had.

Speaker 2: 47:11

I mean, it's interesting that you can yeah. I don't know. You see the chains of thought. And I feel like you certainly see glimmers of this where they will try different things and play around a little and eventually hit on something probably because it's what they've been trained to do for the sake of solving a concrete Yeah. Problem rather than this more taste oriented Yeah.

Speaker 2: 47:29

Thing of like, okay, I've got a better angle or a better grasp on this. Does feel like maybe that maybe that scales. Yeah.

Speaker 1: 47:35

I'm not necessarily skeptical. I just I

Speaker 2: 47:38

That's what

Speaker 1: 47:38

it will take. Yeah.

Speaker 2: 47:39

Makes sense.

Speaker 3: 47:40

Yeah. I feel like the other part that I found really interesting about that was, like, kinda like the limits to how much you can parallelize this.

Speaker 1: 47:46

Mhmm. Mhmm.

Speaker 3: 47:47

Because, like, you know, there was, this dependence from, like, another results in 2023, was it? Yep. And so, like, if we're comparing, like, the Daniel thinks for a hundred years versus, like, a 100 Daniels Mhmm. Like, maybe, like, the hundred years thing is, like, probably more productive in the long run with the continual learning, but maybe there's also, a prioritization bottleneck and, or you need something else to happen broadly throughout the math economy or something.

Speaker 1: 48:10

Yeah. I think I can I can I think that's accurate? Right? So, like like, for example, in this case, the this so this result took input from sort of two deep theories.

Speaker 3: 48:21

Mhmm.

Speaker 1: 48:21

So one was non abelian Hodge theory, and the other was the the Langlands program for function fields. And so the the sort of main thing that happened that allowed the this project to go through was someone else. So I know and Groenig had proven something using the link taking input from the language program for function fields and the three of the companions, whatever that means. And I think that's not something I would have would have thought of. Mhmm.

Speaker 1: 48:47

It's, you know, okay. It's close enough to my area that I knew about it and

Speaker 2: 48:51

And knew how

Speaker 1: 48:51

to put in chance. Yeah. There's some chance I would have like

Speaker 2: 48:53

I see.

Speaker 1: 48:54

Thought about the question at least. But it's a really very creative and important development that was kind of different from, you know, different from the things I was thinking about. So I think it's really not clear, like, I would have even known it was relevant in some sense. Mhmm. Like, if I was just thinking about this problem for a

Speaker 2: 49:11

hundred years.

Speaker 1: 49:11

I was thinking mean, historically, like, asking a human to think about just one problem for a long time without, you know, in isolation is probably not such a productive thing to do. Like, you you you know, you try a bunch of things and at some point, you start hitting diminishing marginal return. Right. Yeah. Mhmm.

Speaker 2: 49:27

I guess the the parallelization, it sounds like this is often the case. Diversity. Diversification is Yeah. Important there. Yeah.

Speaker 2: 49:34

Even to and I I guess today's models, have you tried to get them to do something surprising or more original even if wrong and

Speaker 1: 49:42

Like to yeah. Demonstrate creativity. Yeah. Sure. Mean, I would say right now anytime I think about an open problem, the first thing I do ask ask models for some ideas.

Speaker 1: 49:50

Yeah. They're almost always sort of nonsense.

Speaker 2: 49:52

Yeah. Yeah.

Speaker 1: 49:53

Yeah. So I yeah. I would say I've never maybe never gotten an idea that sort of passes the sniff test for sort of a deep open problem like

Speaker 2: 50:00

And so, mean, it feels like again, it feels like random initialization in some abstract way should solve this problem, but we're just not seeing.

Speaker 1: 50:08

Yeah. Yeah. I and you can also try to, you know, every day I wake up and up with a different random initial

Speaker 3: 50:13

That's right.

Speaker 2: 50:13

That's right.

Speaker 1: 50:13

I think there's still, you know, some kind of persistence in who I am.

Speaker 2: 50:16

I see. Yeah. Ability

Speaker 1: 50:18

to Totally. Really, really kind of search the the full space of mathematics and and so yeah.

Speaker 3: 50:23

I guess the other aspect of this was the returns to intelligence. Yeah. Mhmm. So there was, like, comparing you know, I think if you had a million answers trying to do the stuff that you do, I think I would not make very much progress. And then if we're comparing even if I, like, spend a lot of time trying to learn, like, full time just learner.

Speaker 3: 50:40

Mhmm. And then maybe, like, the next scale of this is, like, there is a a million Daniels compared to super intelligent mathematicians.

Speaker 1: 50:47

Mhmm.

Speaker 3: 50:49

Like, would we expect this to have, like, the same kind of dynamic where, like, the superintelligence is just, like, so much better than even like, no matter how many times you scale up the number or, like, the duration of our researchers, like, does this keep going up, like all the

Speaker 1: 51:01

way up? Yeah. So let me first push back on the claim about you versus me and then we'll talk about me versus so I I actually think I'm I find it plausible that if you really devoted a long time to like learning cool mathematics, you would be able to do cool mathematics. It probably would be different math from what I'm doing just because we have different preferences and and, you know, different capabilities, you know, or stronger in different things. But I I I don't, you know, you seem like a sharp guy, yeah.

Speaker 1: 51:26

Oh, thank you. Yeah. Well, like, I don't see a reason to think you wouldn't wouldn't do interesting mathematics if you if you were sort of had the motivation and and resources to, you know, devoted to it. So similarly, I think with a super intelligent AI, if whatever that means, like, well, intelligence and even just within a fairly narrow domain of math research is exceedingly high dimensional thing. Like, we just see comparing different human mathematicians that were very good at very different things.

Speaker 1: 51:51

Yeah. And so I think even if you imagine AI which is very, very capable of solving interesting research problems, it's not clear that it's very very capable of solving all interesting research problems that all humans are are good at.

Speaker 2: 52:04

But but I I suppose we could try to sharpen the question Yeah. A little by saying of so whatever. Terence Tao is famous, but sure that but you've mentioned others as perhaps in other fields Sorry. Name is Scholz I guess.

Speaker 1: 52:16

Scholz and Kensevich or something.

Speaker 2: 52:18

Yeah. Yeah. Exactly. So within whatever net you wanna cast for people who are well suited by interest or disposition to a particular field. Mhmm.

Speaker 2: 52:29

Like, what do we observe in

Speaker 1: 52:31

There's still huge huge differences. Right? So like, you know

Speaker 2: 52:34

It's so high dimensional. Yeah.

Speaker 1: 52:36

May maybe an example is there's a a sort of classical dichotomy even within any field is is that of theory builders versus problem solvers. Mhmm. And I think that's, you know, a large part about interest, but also about capability. Like, I think I think of myself maybe as more of a problem solver than a theory builder. And I think part of that is just like I'm not as good at theory building.

Speaker 1: 52:55

Sometimes I have to do it in solve a problem. But like my prop you know, this part partly what I'm good at and partly like I enjoy trying to solve interesting But I think that's also even that shade is like some kind of continuum or high dimensional continuum like like, you know, some some problems are like develop a theory to do this.

Speaker 2: 53:13

Well, I I still I still feel like we could sharpen this because Sure. You just say, okay. So add whatever dimensions you want and now we see spectrum of

Speaker 3: 53:23

Sure.

Speaker 1: 53:23

Yeah. Of course imagine an AI which exceeds human capability in every dimension.

Speaker 2: 53:27

Yeah. I guess what I'm curious about is whether just from human evidence, do we see I have in the back of my mind I should just say it. This kind of, you know, lighthearted quote by I think Fefferman on when Tau got his Fields Medal saying like, everyone you know, if you get stuck on a problem, the surest way out is to entrance is to interest Terrence suggesting sort of that there was the sense of

Speaker 1: 53:52

Yeah.

Speaker 2: 53:52

Yeah. You know, like if you get a more capable human on a problem that you know, that they're a good fit for like that isn't you know, theory builder theory building problem or whatever you know, project for a theory builder in an area that they're interested in or something like that. Like, do do you have don't know.

Speaker 1: 54:09

I think I think that they're a good fit for is the key there though. Right? Like, I I think this may be a matter of expertise rather Yeah. Rather than capability. But like, you know, I I think if I got Tau interested in one of my problems, it probably wouldn't be that helpful in the short term.

Speaker 1: 54:22

Like, know, maybe he would load up a huge amount of geometry or arithmetic geometry into his brain. Yeah. Yeah. And then that would be helpful. Yeah.

Speaker 1: 54:28

You know, that would take a while.

Speaker 2: 54:30

And I guess the question is just for whoever is most is smartest subject to being a good fit for for lack of, you know, terms. Whether that like just what that scale of productivity looks like. I know there are people who you avoid working on that problem if you know they're working on it because you're like, oh, surely they're gonna crack this before I do.

Speaker 1: 54:52

I would say that happens in some sense, but I don't think of it as about capabilities. Mhmm. It's like, you know, for example, there's a very active area now is Pieta Hodge theory. I, you know, I'm adjacent to it in various ways. Sure.

Speaker 1: 55:04

Consume the output of Pieta Koch I don't really work in the area, but it's just because it's a very fast moving area. And like, if I wanted to get into it, I would have to learn a lot and and then catch up to these people who are working very quickly.

Speaker 2: 55:15

Got it.

Speaker 1: 55:15

So it it's more about opportunity cost and capability.

Speaker 2: 55:17

So so we we don't see so much, like, the return to whatever we call intelligence in humans. It's not

Speaker 1: 55:25

I mean, you know, my I have a a two and a half year old who I think is very sweet and and very sharp, but, you know, she's not

Speaker 2: 55:31

Not so

Speaker 1: 55:31

good at math research right now. She's two and a half. So, yeah, there are absolutely differences in capability between people. But, yeah, I think among professional mathematicians, it it seems to me mostly about the fact that it's a high dimensional space and people are are interested in and and Specialized. And specialize in different parts of that space, even even within sort of fields you feel are kind of narrow.

Speaker 2: 55:54

Yeah.

Speaker 1: 55:54

Yeah. Yeah. Maybe a remark on that is, like, I think, you know, so if AI if models kind of continue in the current paradigm where they are kind of very jagged and and have, in some sense, sort of narrow capabilities

Speaker 2: 56:08

compared

Speaker 1: 56:08

to a person, Even if that happens Yeah. Still imagine some kind of very substantial value just because, you know, the fact that we specialize in these sort of tiny little zones in a very high dimensional space means that basically all problems are kind of attention bottle bottlenecked. You know, the number of people who even knows what the words Right. Right. Problems even mean is, like, 10.

Speaker 1: 56:32

And you can imagine that in some sense, those problems are not not very difficult because of more of a paradox. Yeah. They're just like like, no one has really had time to think about them.

Speaker 3: 56:41

Yeah. Yeah. I guess, like, another way of kind of thinking about this in a way to, like, say to people who care about superintelligence and, like, fast AI takeoffs and whatever Mhmm. Is, like, suppose we, increase the rate of math progress by a certain amount based on some measure that we think is most people agree is reasonable. Uh-huh.

Speaker 3: 56:58

And then we try to break this down into, like because we have these, like, you know, fifteen to twenty years into future AIs that are doing the math research. And we break it down into, like, the three factors that we were talking about. It's like more AIs, AIs thinking for longer, or and, like, smarter AIs. Sure. Sure.

Speaker 3: 57:11

Like, what would the do we have, like, a sense of, like, what the breakdown would be?

Speaker 1: 57:15

That's a good question. I mean, my sense is that whatever okay. It's not clear to me to what extent, like, progress in math research has increased over the last forty years or whatever. Let's say, for prior to AI, But maybe my sense is that productivity growth, like like, total productivity growth comes down largely to the increased number of mathematicians. Mhmm.

Speaker 1: 57:39

So, you know, maybe once you expect AI to reach a certain level of capability, I I look like at parity with humans, if if this happens, maybe you expect it to come down to number. A little bit of a concern there is, like, human you know, why are there returns to numbers of humans? I think a part of it is just attention. Part of it is like this diversity of modes of thought, and maybe you don't have as much of that with with AIs depending on how Mhmm. How systems develop.

Speaker 1: 58:06

So I don't know. But I I do think, you know, probably a lot of the returns to to to increased number of mathematicians just comes from increased time spent thinking about different problems

Speaker 2: 58:16

Mhmm.

Speaker 1: 58:16

Increased attention, and that I I I expect you could see substantial returns to even with sort of not very capable AI. So yeah. So what about returns to intelligence or whatever? Again, I think this is sort of not such a well posed problem. Like, I think it's a very high dimensional, like intelligence is a very high dimensional object in my view.

Speaker 1: 58:32

Mhmm. But yeah. I mean, it's okay. Maybe let's call it something like the ability to, you know, search through some, you know, high dimensional space of proofs in a in a clever way. I guess you can imagine that that being very valuable, like, you know, presumably the proof of the Riemann hypothesis will develop, you know, involve developing various objects and theories that don't yet exist and that would, you know, take a lot of you know, you'd have to search for them in some very high dimensional search space.

Speaker 1: 59:05

But I don't know. This it feels sort of like too much speculation. Like, I don't know. Yeah. Yeah.

Speaker 1: 59:10

Like, I and in fact, I think a lot of the way that search happens in practice is like that people who are, know, just normal mathematicians are sort of just thinking about objects and observe it, like, doing this sort of low level search I was discussing, like, working through lots of examples and notice patterns, and then people start to sort of encapsulate those patterns into relevant objects. And and really, I I think somehow that sort of high level theory building, which we we attribute to maybe the most brilliant mathematicians, really depends on sort of a background of just, like, hard work and an example example analysis. Mhmm. So, you know, arguably some of that is also attributed both to just just scale and the number of mathematicians and the number of sort of questions people thought about.

Speaker 2: 59:50

As a funny fun little just what are your solve probabilities over time for the Riemann hypothesis or pick your Millennium problem if that's not your

Speaker 1: 01:00:00

Yes. Okay. I mean, the ones I'm probably most familiar with are Riemann hypothesis, Birchen Swinderden Dyer and the Hoch conjecture. Mhmm. I think of those, the Riemann hypothesis actually is the one where in some sense we know what a proof should look like.

Speaker 2: 01:00:16

Uh-huh.

Speaker 1: 01:00:17

And the reason why is that there's an analogous conjecture, the the Wey conjectures

Speaker 2: 01:00:21

Yeah.

Speaker 1: 01:00:21

Which is the Riemann hypothesis for varieties well, part of The Riemann hypothesis for varieties of our finite fields

Speaker 2: 01:00:26

Mhmm.

Speaker 1: 01:00:26

That was proved proven by Berlin in in the seventies. And maybe the part that's closely most closely analysis to the Riemann hypothesis is the vacant edges for curves, which are proven by Bay much earlier.

Speaker 3: 01:00:38

Yeah.

Speaker 1: 01:00:38

Yeah. So we we really have some sense of what the shape of a proof should look like. It should look like Bayes proof of Yeah.

Speaker 2: 01:00:42

Yeah. Yeah.

Speaker 1: 01:00:43

Two proofs, but one of the one of Bayes proofs of the Riemann hypothesis for curves. Mhmm. And then the problem is that there's various maneuvers ones makes in that proof that don't make sense Don't work. For like the integers Yeah. Than a curve over a finite field.

Speaker 1: 01:00:55

And so one has to figure out how to make those maneuvers work. Mean, this is sort of science fiction, but Yeah. Yeah. I think this is at least some kind of plausible, you know, mystical way of thinking about a proof. So, yeah, of course, many mathematicians have thought about how to try to make these maneuvers work, and I think there's some chance that, you know, one of those attempts will work out in the next ten years.

Speaker 1: 01:01:15

Don't know. I mean, it's the number I put on it is like 15% or something. Gotcha. But that's just meant to be like a fairly small, but not not Sure.

Speaker 2: 01:01:22

Small, but not zero?

Speaker 1: 01:01:23

Yeah. And in general, I think if you look at like sort of solutions to major open open problems like the the time difference between the the sort of last big idea you need Yeah. Like completing the proof has been fairly small, like, you know, for Fermat's last theorem, like the distance between the Tanayama Shimur conjecture

Speaker 2: 01:01:40

Yeah. Sort of

Speaker 1: 01:01:40

a very hard conjecture related to the Langlands program, the proof of their own hypothesis, not not

Speaker 2: 01:01:44

The m.

Speaker 1: 01:01:45

Temporally not so Yeah. Yeah. Sorry. Fermat's theorem is not so

Speaker 2: 01:01:48

That that does make it sound a little like maybe because vacant geckers for curves is, you know, decades old that maybe Yeah.

Speaker 1: 01:01:57

So in fact, people expected, I think, when DeLean proved the vacant geckers that the proof of the Rumann hypothesis would come soon. That's not just as well before I was born, but that's my understanding of the history. And then it wasn't. So maybe some People don't know. Yeah.

Speaker 1: 01:02:10

So so I think people don't really know what the missing new ideas are. You know, we have some sort of very big senses what to what their shape should be.

Speaker 2: 01:02:16

And then I guess the the obvious follow on is what has your time prediction time frame changed with AI progress at all?

Speaker 1: 01:02:27

That's a good question. I mean, I think I am right now, I haven't really like, don't feel like I've so part of the reason of my bet with with for my bet with my earlier is that I don't feel like I've seen some aspects of what I think is necessary for high quality research. Got it. I don't think I've seen sparks of it from AI. You know, the distance between solving an IMO problem and writing doing high quality math research is larger in my view than I think some people some other people seem to think it is.

Speaker 1: 01:02:54

Yeah. I I I would say it has not moved so much. I mean, right, like, right now, the tasks that AI can help with and the tests that seems like it will soon be able to help with don't seem to me to be the primary bottlenecks Yeah. Resolving major open conjectures. That said, I do think, you know, there's I I gave a 25% chance that AI Sure.

Speaker 1: 01:03:13

That these problems will be solved and AI will be able to, you know, resolve do high quality research. I think depending on the there's, you know, maybe some 5% of that, there's some depending on how that works that could be

Speaker 2: 01:03:27

Could be so good that it's on track

Speaker 1: 01:03:28

Yeah. To

Speaker 2: 01:03:29

yeah. That to to do something, you know, crazy. Yeah. So just for fun, Millennium Prize problems, these big targets in math, I can't tell where are you on, how likely you think those will be solved and whether AI will contribute substantially to those solutions.

Speaker 1: 01:03:46

Yeah. I mean, I I think my median prediction is that zero will be solved autonomously by an AI in the next ten years.

Speaker 2: 01:03:51

Ten ten years? Yeah. I see.

Speaker 1: 01:03:54

My at all, I would say maybe zero to one. I would expect it to be solved in the next ten years. You know, there's arguably been some progress on Navier Stokes, which is far from my area, but I I don't find it totally implausible that that will be resolved. And I I you know, the the the current news about it is about a team working with jointly with DeepMinds.

Speaker 2: 01:04:14

Yeah.

Speaker 1: 01:04:15

Mhmm. That would be with some kind of maybe more traditional deep learning techniques involved, not not LLMs or reasoning models.

Speaker 2: 01:04:22

And I guess our I mean, one of the seven was resolved not too long after they were codified, the Parmigray conjecture. So our base rate is not zero. We should expect like Yeah. These are impossible. They're

Speaker 1: 01:04:33

just That's very hard. Yeah. But I I think with a couple of them like Hodge conjecture Mhmm. Seems like there's just no ideas. I see.

Speaker 1: 01:04:41

BSD similar. Yep. With with the Riemann hypothesis, I

Speaker 2: 01:04:45

guess we discussed You mentioned. Yeah.

Speaker 1: 01:04:47

There's some kind of ideas, but it's really kind of unclear how how how far we are away. Think it could happen in the next ten years. It could happen in the next hundred years.

Speaker 2: 01:04:54

And so relating this to your other AI timeline senses, the twenty five percent chance that we do hit annals level papers in five years Yeah. Some part of that goes toward, you know.

Speaker 1: 01:05:08

Yeah. I think it's But I my expectation is that that I think it seems very possible for that to happen without

Speaker 2: 01:05:13

Yeah.

Speaker 1: 01:05:13

Without starting to knock out So

Speaker 2: 01:05:15

basically not much of of that. Yeah. And then you're sort of, like, twenty years

Speaker 1: 01:05:20

Yeah.

Speaker 2: 01:05:21

You're you're, you know, it's beyond the the when when this all has had a lot more time to to brew and develop.

Speaker 1: 01:05:27

Yeah. And I think it depends a little bit on what kind of, you know we talked a little bit about acceleration. I'm not an expert in this, but, you know, presumably, there's some question about acceleration of AI progress itself or self improvement, and that I'm generally pretty skeptical on

Speaker 3: 01:05:41

Yeah.

Speaker 1: 01:05:41

In the same way I'm skeptical of acceleration in in general. Yep. But, yeah, we'll see, I guess.

Speaker 2: 01:05:47

So this might slowly transition us into measurement benchmarking. But first, with sort of a fun speculative version of that. You've said you're not seeing so many glimmers of some of the things you think are Mhmm. Critical. Well, let me ask the boring version first.

Speaker 2: 01:06:03

What are some of those things? You've talked about them plenty on Twitter, but just for our sake Right. Like what are some of those?

Speaker 1: 01:06:08

Yes. You know? Let me imagine like like what are some things that would be signs for

Speaker 2: 01:06:12

me Oh, sure. Yeah. Yeah. Yeah. That's a fun version.

Speaker 1: 01:06:14

Like, for example, you know, making a new interesting definition Mhmm. Would be would be important. So showing some kind of research taste would be important. So like asking a question or like discovering even just like conjecturally some new phenomenon. Like, I think a lot of the most important mathematics is actually just making conjecture.

Speaker 1: 01:06:32

Yeah. Yeah. It seems sort of hard to get get current systems to do. I I I think it would be interesting to get it to do experiments there. Yeah.

Speaker 1: 01:06:40

Developing, you know, this is sort of related to making a definition, but Mhmm. Some kind of theory building. I don't think we've really seen. Like somehow, it's already been very surprising to me that the the models are able to learn learn during training. Yeah.

Speaker 1: 01:06:54

I don't mean during use or during deployment and like techniques that's, know, apply them. So that I I think that's somehow the thing that that really changed with the Yeah. Reasoning models. Like, they were able to take some sort of well known techniques and now start to really apply them with high reliability to to to to do math. Yeah.

Speaker 1: 01:07:12

If they were to I mean, this is sort of a a fuzzy, shazzy thing, but if they were to develop a new technique Yep. Even in I mean, this is you know, there's there's some kind of continuum between old techniques and new techniques. Like, Put a little twist on a new Yeah. Yeah. Yeah.

Speaker 1: 01:07:24

On an old technique. Is that a new technique? But, yeah, if if I were to recognize something like that, that would be a sign. And I don't think any of those things have happened. Will they happen soon?

Speaker 1: 01:07:33

Maybe. Don't know.

Speaker 2: 01:07:34

And just in case it catches anything else in the in the net, if I were to say of the tasks that constitute doing math, which are the hardest for AI systems, this kind of theory building conjecturing category seems critical. Are there others you would you might

Speaker 1: 01:07:50

Yeah. I mean, I right. So like like for me, I tried to have all of my papers contain a new idea. Uh-huh. And and again, this is fuzzy thing, new idea, new technique, whatever.

Speaker 1: 01:08:00

It's not true for all of them. Yep. Sometimes you you find a a trick to sort of resolve small conjecture and of course yeah. I think that is you most people don't have a lot of new ideas. Like, mathematicians, I think, typically write, okay, depending a little on the area, you know, one to two papers a year.

Speaker 1: 01:08:18

So Yeah. Compared to other subjects, it's we're not very productive in terms of of total output. And I think that thing where you develop a new technique is Yep. That's really the that's our new sort of genuine new idea. I mean, you know, what does that even mean?

Speaker 1: 01:08:32

But I I Totally. That I No.

Speaker 2: 01:08:34

When you see it.

Speaker 1: 01:08:34

That's the hard part.

Speaker 2: 01:08:35

Yeah. Yeah. And so I just I think we've practically answered it already but the with them something people like to point out for the game playing systems Mhmm. And Go in particular AlphaGo was a move 37. Yeah.

Speaker 2: 01:08:50

I suppose something like I think part of what was exciting about move 37 is it was described as an inhuman move Yeah. Like a move no human would have thought of. And so the AI had really stepped out of the bounds of I mean it wasn't trained on much human data Right. But it had stepped outside the search space that humans had looked at. Right.

Speaker 2: 01:09:08

It sounds like it would be enough of a move 37 to just have a new idea Yeah. Even if it was something a human could think of. I don't know.

Speaker 1: 01:09:13

It would It's been a funny. I mean, in some sense that's happened prior to the deployment of these AI systems. If you think about, you know, the the Kepler conjecture proof of the four color theorem, like, those are inhuman proofs. Like Sure. They're, you know, of course, a human was in charge, but Yeah.

Speaker 1: 01:09:28

Yeah. The sort of the majority of the work was some kind of Yeah. Horrible casework done by done by a computer or beautiful casework. Sorry. Of course.

Speaker 1: 01:09:36

No such thing

Speaker 2: 01:09:36

as Does anyone Yeah.

Speaker 1: 01:09:39

No. I'm not. I I like to say in that mathematics today, everything true is beautiful. Yes.

Speaker 2: 01:09:43

So And you've you've said you encourage your grad students to adapt their way of thinking to this.

Speaker 1: 01:09:48

Yeah. You you should you should prove things by any means necessary.

Speaker 2: 01:09:51

Totally.

Speaker 1: 01:09:51

Totally. Why why tie your hands behind your back? But yeah. So that that that's some kind of move 37 arguably, right,

Speaker 2: 01:09:58

which had

Speaker 1: 01:09:58

nothing to do with an AI system. Would I consider an AI AI system generating that kind of proof? So, like like automating Yeah. A huge amount of casework itself to be a move 37? No.

Speaker 1: 01:10:09

Because we've already seen human Yeah. Do it. I guess you can also imagine some kind of weird, like, lean proof of some hard Sure. Where the, you know, with no comments that itself is very hard to extract a human argument from. Like, if you've ever looked at lean code, it's not easy.

Speaker 3: 01:10:24

Junky. Yeah.

Speaker 1: 01:10:24

Yeah. Tough. So yeah. But, yeah, I I think if I were to see something that I considered to be like a new technique, like something where I couldn't really find a precursor in literature, I mean, well, whatever pre again, sort of a judgment call as to what a precursor is Yeah. That it would be very exciting.

Speaker 2: 01:10:46

And I mean, this is part of why I pay attention to your commentary is I think that, like, goalpost moving, post hoc judgment, very hard. But you seem to be trying to do an honest job. Yeah.

Speaker 1: 01:10:57

I mean, I appreciate it. There's a huge there's a huge temptation to Yeah. Post posts here. Like, you know, maybe maybe if I do think some extent it's fair, like right now with the signal you get from AI solving a problem is partly the AI is capable and partly the problem is easy. Yep.

Speaker 1: 01:11:14

And so, you know, mathematicians have a habit of saying, oh, this problem was solved by AI, but it was actually easy. So maybe we shouldn't update on

Speaker 2: 01:11:21

that. Right.

Speaker 1: 01:11:22

To some extent that's true. Course. Think Should

Speaker 2: 01:11:24

be a mix. Yeah.

Speaker 1: 01:11:25

Yeah. But yeah, I think, you know, we we should try to evaluate what the what the models are doing and and try to see like, oh, well, if if a human had written this, like, I be excited? And I think that's that's like that's what I try to keep in the back of my head here. So for example, for these recent Hertish problems, like, if a human told me they solved this Hertish problem with this proof which has happened, I would be like, oh, that's cool. Yeah.

Speaker 1: 01:11:45

Yeah. And that's also how I feel about

Speaker 2: 01:11:46

the Right.

Speaker 1: 01:11:47

Right. Oh, that's

Speaker 2: 01:11:47

Yeah. Yeah. And, like, life goes on, but but, you know, don't diminish it at

Speaker 1: 01:11:50

the moment.

Speaker 3: 01:11:51

Yeah. Good.

Speaker 2: 01:11:51

You did a thing. Yeah. Did something nice

Speaker 1: 01:11:53

there. Yeah.

Speaker 3: 01:11:55

I'm curious if you think it's, like, feasible to kinda well, actually, let me step back a bit. So one thing you've mentioned on Twitter is, like, that whether these problems are like, solving these problems is impressive, depends a lot on whether or not there was a lot of human effort that was put into it previously. And do you think it's possible for us to kinda, like, say, go through these, like, Erdos problems and then kinda, like, just or has it already been done? Like, just to try to, like, say

Speaker 1: 01:12:20

Put amount of Yeah. How much effort has they

Speaker 3: 01:12:22

put into Yeah.

Speaker 2: 01:12:23

Of course,

Speaker 1: 01:12:24

you can look at you can kind of look at you know, they were proposed in some paper. You can look at how many citations that paper has. I think I with the ones that were solved, actually don't know. So I think the hard version of one twenty four that was not solved, the paper is 14 citations. So for a 1996 paper, that's, you know, that's math is not a lot of citations.

Speaker 2: 01:12:42

I see. Yeah. Don't know what mean.

Speaker 1: 01:12:43

Is not a lot, but it's not

Speaker 2: 01:12:46

Not nothing. You know,

Speaker 1: 01:12:46

there are plenty of papers from 1996 with zero citations.

Speaker 2: 01:12:48

Of course, this might be a long paper, so it's unclear.

Speaker 1: 01:12:50

It's a page paper, but it's probably it's it has a lot of purposes. Got I think it's probably many of those probably are not citations

Speaker 2: 01:12:57

To the are working. Problem of interest.

Speaker 1: 01:12:58

Yeah. Yeah. Prob I would not I haven't gone through them, I would not be surprised Do you

Speaker 2: 01:13:02

know how many people are working on the p curvature conjecture?

Speaker 1: 01:13:06

A very few. Would say. I mean, but it's something where yes. For context, the p curvature conjecture is kind of one of my the gross kneecats beaker verdure conjecture is kind of one of my white whales.

Speaker 2: 01:13:15

Yeah.

Speaker 1: 01:13:15

Yeah. Something like I would I I've thought about a lot. I think probably very few people are actively working on it. I can think I I assume I I would expect I know all of them. Yeah.

Speaker 1: 01:13:25

Yeah. Probably it's, you know, fewer than 20 is what I would guess. Historically, there have been more. Uh-huh. So I think there was quite a lot of activity in in the eighties and nineties and early two thousands.

Speaker 1: 01:13:37

Mhmm. Somehow people got stopped and it died down and maybe now there's some new ideas again. Gotcha. Or at least new ideas in related related areas. But, yeah, I think it's, you know, somehow any question where the words involved, not a lot of people know what they mean is It's automatically gonna be attention denominator.

Speaker 2: 01:13:56

Yeah.

Speaker 1: 01:13:56

Yeah. Like here the you know, there's probably a couple thousand people who know what all the words mean. Gotcha. Yeah.

Speaker 2: 01:14:02

Alright. Is there a this is a little random. I'm just curious. Like, is there a bit of a selection effect where I know with the the Colat's conjecture

Speaker 1: 01:14:10

Yeah.

Speaker 2: 01:14:10

That I like Wikipedia discusses how it's a little embarrassing to work on. So because it's that's how it's characterized.

Speaker 1: 01:14:17

It's also kind of a crank magnet I think.

Speaker 2: 01:14:19

That too, maybe. And so you don't want the association with that. Yeah. But also I guess there's some problems where it's like who are you famous open problems like this where the expert opinion is we don't really have the right techniques to solve it. And so there's a little bit of a like, I don't wanna stick my neck out and say I think I'm good enough to Are there a lot of quiet small attempts on big famous problems that wouldn't be announced?

Speaker 2: 01:14:41

Like, is there a silent 10 x I think

Speaker 1: 01:14:43

everyone, you know, thinks a little bit about a big problem. It's hard to make an attempt on a a problem like that. Like Uh-huh. Like, you need an idea. Yeah.

Speaker 1: 01:14:50

Yeah. You know, I think with, you know, the Clatz conjecture, I think many people actually have thought about it. Sure. Including quite well known people have have published papers Yeah. Yeah.

Speaker 1: 01:15:00

On on the Kletz related topics. So I I

Speaker 2: 01:15:02

expect so.

Speaker 1: 01:15:02

I think it's not really attention bottleneck like

Speaker 2: 01:15:04

Sure. But I'm more meant if you were trying to estimate the amount of attention it had received like would you have a how would you account for the fact that, you know, probably everyone has thought about it? Like

Speaker 1: 01:15:16

Yeah. I mean, right. So what does it mean to try to solve a time like, you're like, oh, you know, oh, would be nice to solve this problem and then, you know, you you try nothing.

Speaker 2: 01:15:26

Right. Right. Right.

Speaker 1: 01:15:27

So like how many people actually try something?

Speaker 2: 01:15:30

Or maybe a different angle is like, what what does it feel like when you're like, oh, maybe I do have an idea worth trying. Of course, it mostly fails most of the time, but not just, I remember the Riemann hypothesis today. Oh, well, that was nice. Like, more like, oh, what if this has some bearing on? Like, what does that how does that come about?

Speaker 2: 01:15:48

It might just be Yeah.

Speaker 1: 01:15:49

I mean, so right. So usually you well, okay. Sometimes you wake up in the middle of the night. Yeah. Yeah.

Speaker 1: 01:15:53

Great idea. That that does happen. Sometimes, a lot of a lot of what I do when I try to to think about a a problem I'm interested in is why do you start with an idea, you start with a new technique that you kind of came up with in some other way, and you're trying to extract value out of it. So think what problems are is this relevant to and some things some so that's sort of an opportunistic approach Yeah. To Yeah.

Speaker 1: 01:16:15

To mathematics. What's what's what happens sometimes for me and, you know, sometimes I do kind of set out to solve a problem or prove something or maybe more accurately try to understand something and then benchmark my understanding by proving something relating to

Speaker 3: 01:16:29

it. Yeah.

Speaker 1: 01:16:30

Is you kind of take the minimal example where you don't know how to approach it via via the techniques that you have available to you and and you try to work it out and and develop techniques to handle that minimal example, and then you see how far those You maybe get

Speaker 2: 01:16:43

a little hill climbing effect

Speaker 1: 01:16:44

going on. Yeah. And so and then, of course, you can iterate that.

Speaker 2: 01:16:46

Yeah. Yeah.

Speaker 1: 01:16:47

So I I think yeah. So I I the most common, I think, situations for me are, yeah, either somehow kind of opportunistic approach to a problem Yep. Or and and I think this is, by the way, something where AIs can be Yeah. Expert kind of helpful. Right?

Speaker 1: 01:17:04

Like, in order to resolve conjecture, have to know it exists. Uh-huh. And so you develop some technique, and then you can say, well, what is this relevant to? Think Does

Speaker 2: 01:17:11

anyone else care about this?

Speaker 1: 01:17:12

Yeah. Yeah. So far, this hasn't been useful, but, like, I have resolved open conjectures by developing some technique. And then someone mentions to me, oh, is that related to this? And I'm like, oh, yeah.

Speaker 1: 01:17:21

It is. And then that, you know, solves of like Yeah. Yeah. Some example, like, solves a 40 year old open problem that can Very nice. That's yeah.

Speaker 1: 01:17:29

Yeah. Yeah. So so one possibility is opportunistic, and then the other one is the situation where you're kind of, you know, try to try to extract some minimal viable example and just play with it until you get somewhere.

Speaker 2: 01:17:40

I I guess this from a data analysis perspective, part of what is in the back of my mind is if we do take something like like what Anson suggested based on citations or whatever as a metric, would there be something kind of nonlinear correction we need to would there be some kind of nonlinear correction we need to do for yeah. But that's gonna understate famous problems because everyone sort of checks. I have a new technique Understate. Overstate.

Speaker 1: 01:18:03

I mean, I think there are lot of papers that that cite cite the Rumon you know, work on the Rumon hypothesis without without making any meaningful progress towards the Rumon hypothesis. Yeah. It's not so clear. I I Okay.

Speaker 2: 01:18:14

Great. Then it's a wash. Yeah.

Speaker 1: 01:18:15

I think it's very hard to to evaluate difficulty this way.

Speaker 3: 01:18:18

I guess there's also the junk papers problem. The the

Speaker 2: 01:18:22

Right. I reckon I reckon it's

Speaker 3: 01:18:23

something more famous. Know? Yeah.

Speaker 1: 01:18:26

Oh, yeah. Yeah. There's also, yeah, a huge number of papers claiming to prove the Riemann hypothesis, which which obviously no problem. Yeah. It is the probably there's a huge number of of computer science papers citing, you know, things relevant to p versus NP.

Speaker 1: 01:18:38

Are they really making progress on it? Like, how many people are actively thinking about p versus NP?

Speaker 2: 01:18:42

So yeah. Linear approximation it is. Yeah. We'll we'll get by. Before we moved on move on fully to measurement, I remembered one other question for you Mhmm.

Speaker 2: 01:18:53

Which is your I again, I think you have a clear headed view on AI where it's going, where it might be going, you know, reasonable probabilities, whatever non extreme anyway. Have you made or are you planning starting to plan ways you adapt your own professional life or trajectory to be, you know, survive in this world, thrive in this world.

Speaker 1: 01:19:16

Yeah. So I would say there's certain things I'm doing with the expectation that AI will become more capable. So for example, right now there's a lot of work on formalizing mathematics Mhmm. Like in Lean or other other proof verification software. And I'm not really working on this, and the reason why is that I expect that tools for vibe coding, vibe coding Yeah.

Speaker 1: 01:19:34

Vibe formalizing. Will dramatically improve the next few years. So, I mean, just Always faster than than to try to do it myself. But partly, is because I'm not very good at Sure. I I'm not an expert in Lena.

Speaker 1: 01:19:49

I play around with a little bit. So that's something I'm doing differently. I I don't think I'm adapting, like, which problems I think about or what techniques I'm using based on the expectation of of improved capabilities. And I think part of the reason why is that I just don't view my job as, like, proving things. Mhmm.

Speaker 1: 01:20:05

I view my job as, like, trying to understand things. Yeah. And then when you prove a theorem, that's like a benchmark of understanding. So that's like Gotcha. You know, there's some some theorems that are interesting because well, or conjecture that sort of helps you understand

Speaker 3: 01:20:17

Yeah.

Speaker 1: 01:20:18

A subject and then proving it actually is it it it, you know, improves your understanding Sure. Of the subject. But often, you know, a lot of a lot of theorems or conjectures are kind of syncs rather than sources. Meaning, you know, if you've developed a a technique that can prove the theorem, it shows you've understood

Speaker 2: 01:20:35

something. Yeah.

Speaker 1: 01:20:35

But the actual value is in the understanding. And it's sort of funny. I I I think this is maybe something that explains part of the difficulty in training AI to do good mathematics, which is that a lot of what we write does a poor job of conveying the actual value proposition of mathematics, which is that there's a human who now understands this object better.

Speaker 2: 01:20:57

And and like what the I don't know. What the words in your mind in the voice in your head that like, you know, amount to or or sort of are tightly related to that understanding ours, not the words that appear in the paper.

Speaker 1: 01:21:10

Right. And and, of course, you would you wish you you've tried to convey your intuitions, but it's just a famously hard problem. Like, you know, if I could tell my students, like, oh, like, this is how you should think about this object. I mean, of course, I do tell them Right. Right.

Speaker 1: 01:21:21

Right. But, like, it doesn't convey anything valuable. Like Yeah. It gives them some hint that that they can kind of decompress or unpack in some way by playing with the objects themselves. But, you know, you can't directly Limited.

Speaker 1: 01:21:32

That understanding. It's just not not contained in the text some very compressed or or hinted at form.

Speaker 2: 01:21:37

I I can also imagine like you write up a paper that's here's the proof of this conjecture and the route is so much more roundabout to developing that theory that, you know, that's that there's just

Speaker 1: 01:21:48

Right.

Speaker 2: 01:21:48

Not much training.

Speaker 1: 01:21:48

Yeah. Mean, a lot of a lot of sources. In fact, when you when you prove something, you have some often very straightforward idea Right. For how the proof should go. And then there's various roadblocks that come up maybe because, you know, we don't understand some intermediate option Sure.

Speaker 1: 01:22:01

From the proof and so you find some way around it and actual written argument looks very ugly. And of course, you try to hint it like this is what I'm actually trying to do, but you know, how successful is that? Right. So so how is this related to like how AI affects my personal Yeah. Planning?

Speaker 1: 01:22:14

Yeah.

Speaker 3: 01:22:15

Yeah.

Speaker 1: 01:22:15

Right? Like, well, AI can't understand something for me.

Speaker 2: 01:22:19

Right.

Speaker 1: 01:22:20

Right? And and because of this issue of the difficulty of conveying intuition, like, even if a model exceeded my capabilities in every dimension

Speaker 2: 01:22:29

Right.

Speaker 1: 01:22:29

Like, that would probably help me a little bit understanding these objects. Yep. But You'd have stuck.

Speaker 2: 01:22:36

But it would you'd still have to do the work. Yeah.

Speaker 1: 01:22:38

You still have to do

Speaker 2: 01:22:38

the work. I mean, I saw you I saw you say this on Twitter and I think you were citing a mathy philosopher, Peli Gritzer Mhmm. On like the point that oddly enough, the societal role for a mathematician is to embody mathematical understanding. Yeah. I love this.

Speaker 2: 01:22:54

It's beautiful. What what a nice actually, I can't tell how much he agree he like is bothered by this. Like Yeah. If we're in a world where but I'm curious how much you're bothered by it or not. If we're in a world where AI systems can pretty quickly resolve faster than any human total total AI dominance of human mathematics, would we still like, would that bother you?

Speaker 2: 01:23:17

Would you or would you still pursue mathematics?

Speaker 1: 01:23:19

Mean, I think there's like some kind of, you know, part of doing math is like you get a whatever a rush when you prove some Sure. Something that's fun. That thing doesn't necessarily have to be open to Uh-huh. An open problem to get that rush actually. Yep.

Speaker 1: 01:23:32

So, you know, you lose maybe some ego boost or whatever, but the the the main sort of emotional aspect is still there. But also, mean, like for me, yeah, I think the actual goal is to understand stuff in a world in which, like, primary role is to, as Pelly likes to say, embody human understanding or, you know, where we're running seminars on the latest great result proven by an AI. Yeah. Like, if if society is willing to support that activity, I would be pretty happy with it. Like

Speaker 2: 01:23:58

Post this lasts in post scare post scarcity utopia.

Speaker 1: 01:24:02

For sure. Yeah. And I think I mean, just to be clear, I do think we're quite far.

Speaker 2: 01:24:07

Sure. Sure. That's at least a question that isn't in some ways isn't super duper speculative because who knows how long it takes to get there. But if, you know, we're like our still like ourselves when we get there, then Yeah. We would wanna do that.

Speaker 1: 01:24:18

Yeah. There and there is, of course, like a social question, like if models have absolute advantage for humans in all areas of math research, or if the public has the perception that they do, which I think is more likely, will will, you know, society be willing to support that activity? I think that's an open question. But, you know, I hope I hope so.

Speaker 2: 01:24:39

Sure. Somehow it's lasted. I I guess maybe society currently has the perception that math ends up being useful, like, lot

Speaker 1: 01:24:45

of the time. Sometimes. Yeah. That's true. Yeah.

Speaker 1: 01:24:48

Mean, I think overall, one way in which math does end up being useful is that there are human experts and that we we there's human capital developed out of math. So you've been, you know, people working on the most abstract and core mathematics, like, they're the fact that they embody understanding is valuable. And it's not totally clear. I mean, I think depending on how creative and and sort of, yeah, capable at innovate of innovation that the future models are, it may still be valuable to to have humans embody that understanding, you know, even if it sort of narrow narrowly within math research, AIs have absolute advantage.

Speaker 2: 01:25:28

We would love to take, like, the track we've been on making benchmarks Mhmm. Frontier math in particular being, know, the the work epoch has done in this and like move continue the y axis up Right. Because I but it seems like that would be maybe a little naive because I think as we've learned from making frontier math like that y axis is not capturing everything Right. That's important. So first of all, just to get it on the table.

Speaker 2: 01:25:59

What are what is the big thing what are the big things that's missing?

Speaker 1: 01:26:03

Right. So yeah. Let me kind of say a little bit about what I think a benchmark measures. Yeah. Maybe that's and then we can say what's what's left over.

Speaker 1: 01:26:11

Right? So I think maybe what you're trying to do when you make a math benchmark is you're you're trying to benchmark to some extent knowledge. Like, do you know what the words mean? Do you know what the existing results are? Knowledge of existing techniques.

Speaker 1: 01:26:22

Can you apply existing techniques? And then also some kind of reasoning ability and creativity, like Mhmm. Some problems are supposed to be hard, which means that they require some some kind of, you know, reason whatever reasoning ability. And I think what the benchmarks primarily end up measuring is knowledge. Yeah.

Speaker 1: 01:26:40

And so let me kind of dig into that. So when a human solves a problem, humans typically have very limited knowledge. Mhmm. So what do they do? They maybe have some idea, then they you work on it for a while, and you realize, oh, I need I need this fact, or I need this result as an intermediate step.

Speaker 1: 01:26:57

Then And you try to prove that result or you look it up or whatever. But the the activity of, like, like, proving an intermediate result is like a very or even finding that intermediate result Mhmm. Realizing that it's something that could exist, could be true Yeah. Is like a very reasoning intensive activity. If you know well, you know, have memorized the entire mathematic literature, you already know that result exists.

Speaker 1: 01:27:20

It's much less reasoning is required to realize that it's like it's like a true fact that would be useful to prove prove the thing you're trying to prove or compute the object you're trying to compute. So when you're asking a question to something that has memorized the whole literature Yeah. I think what you're primarily doing is, like, you're not testing that sort of secret reasoning ability that happens when a human with very limited knowledge knows, like, tries to solve a problem. Yeah. They have to find what knowledge they need to to they have to discover something intermediate that the is already discovered by someone else, but not not by the human in question and and the model already knows.

Speaker 1: 01:27:54

Yeah. So I think a lot of questions that in humans test reasoning ability and are correlated, you know, very highly with mathematical expertise and and research success are much less correlated with that that kind of capability in models, it seems like. Yeah. I mean, models I I like any human who can do as well as a model head on for tier math, like, maybe no such human, in fact, exists Right. Would probably be a very successful researcher.

Speaker 1: 01:28:15

Totally. Successful researcher in a way we're not seeing the models do, and I I I think that's this is part of the explanation. The models the very same question is testing something different in the models than just in human because the models sort of require much less reasoning ability to sort of find the intermediate results that they need.

Speaker 2: 01:28:31

To to to add something you may not have seen because it was it was we did this deep dive into Gemini 2.5 deep think

Speaker 1: 01:28:40

Mhmm. Mhmm.

Speaker 2: 01:28:41

Its math capabilities including running it manually on Frontier Math in the web app. The Frontier Math questions, the problems all have three ratings associated Yeah. With them which we called background, execution and creativity. Execution was sort of something like how long is the solution, just how much like nitpicky calculations do you have to work through. Background was how advanced and obscure the background and creativity was supposed to be.

Speaker 2: 01:29:06

I think probably of those three is the most correlated with this reasoning capability you're describing. We see a negative correlation between Gemini 2.5 deep thinks scores and background and execution. Meaning the higher the background rating, the more difficult the worse Gemini scored. No correlation with the creativity rating.

Speaker 1: 01:29:26

Oh, interesting.

Speaker 2: 01:29:26

Which was fun. This was like a nice result It's which I think like captured I mean, like there's a bit of a selection bias. Like, like this result because I think it captures exactly the phenomenon you're saying. Even though we tried to make frontier math span this creativity dimension Mhmm. That doesn't seem to be what the models are actually picking up on in terms of what problems they find harder solve.

Speaker 1: 01:29:45

Do you do you see the same correlation with other models? Yeah. Yeah. I'm just wondering because I think somehow Gemini seems to do less, like, synthetic data. So I Gotcha.

Speaker 1: 01:29:52

I think the yeah. It may be my my sense overall is that Gemini knows less about fancy topics.

Speaker 2: 01:29:58

At least, say same same pattern for GDT five

Speaker 1: 01:30:01

Very interesting.

Speaker 2: 01:30:02

Others that we've that we've looked at, which is yeah. It's cool. But but yes. So, yes, indeed, it's missing this this piece.

Speaker 1: 01:30:09

Yeah. So I and yeah. So so the right. So I think the the models are somehow yeah. Sorry.

Speaker 1: 01:30:13

Benchmarks are somehow measuring reasoning ability less in models than they are in humans. Yep. The other thing is, okay, the just the you know, we try to write, like, a hard problem when when one writes a problem for Right. From your math. But, like, realistically, the people doing this are busy.

Speaker 1: 01:30:28

They write something that, like, they sort of already know how to do. Yeah. And problem that a random person x already knows how to do is obviously accessible via existing techniques. Yeah. You know, maybe they haven't like, they came up with some new techniques to solve the problem.

Speaker 1: 01:30:42

They haven't yet put it in their paper. Right. Whether it's really new or just new to them is a question. Yep. Well, eventually, the paper comes out and makes it into training data, and then we're no longer measuring models' ability to to to develop new techniques.

Speaker 1: 01:30:58

Sometimes the the hard thing that needs to be done is you need to, you know, plug some numbers into a formula where that that formula is in a paper where understanding the the words in that paper requires a lot of background, but okay. The models have a lot of background. They can they can go to a paper and plug in numbers into a formula on that paper.

Speaker 2: 01:31:15

Right.

Speaker 1: 01:31:16

Yeah. And so, you know, sometimes you end up testing things like, can you read a PDF?

Speaker 2: 01:31:20

Right. Mhmm. Yeah. Totally.

Speaker 1: 01:31:22

Yeah. So I'm not I've kind of my sense is that, like, if a benchmark is constrained by what a person can do in a few hours

Speaker 3: 01:31:32

Mhmm.

Speaker 1: 01:31:33

It's probably gonna be saturated soon just because what a person can do in a few hours is is, you know, very limited. Yeah. Yeah.

Speaker 2: 01:31:40

I I do think one of the Frontier Math tier four contributors and I'm blocking on his name, but had mentioned that he was quite proud of his problem because it just so happened he got, you know, nerd sniped or whatever. And instead of just coming up with something he sort of already knew how to do, set himself a small research problem Dan Rumik maybe? But what he was saying is he set himself this two week research project. What became a two week research project of he set out with like, I wanna do something with these techniques but I don't exactly know how to set up what I won and I don't think he got to like a publishable result or anything but it was more of an exploratory process of I'm gonna investigate this case and maybe a problem will fall of it. And that sounded like a little nicer to me.

Speaker 1: 01:32:25

Yeah. I mean, I think, of course, the longer you get someone Right. To spend a set of problem, I think probably that will correlate with problem quality. I do think, you know, there is some there's another sort of trap you can fall into here, which is like someone sets out to write a hard problem. What they end up writing is a problem which is hard for them.

Speaker 1: 01:32:41

Yep. And one way that can happen is you just try to write a problem in a field where you're not an expert. Yep. And so you

Speaker 2: 01:32:47

Everything seems new and exciting and difficult.

Speaker 1: 01:32:49

End up writing an easy problem for an expert in that field. So, yeah, I think, you know, just writing a hard problem and then not trying to somehow test its hardness Yeah. Is a recipe for, yeah, recipe for producing problems that are, you know, or benchmarks that will be

Speaker 2: 01:33:04

saturated easily. Mhmm. So then let's get into this piece of work that that Epoch is cooking up right now that we're calling provisionally our open problems benchmark. This is something we hope to release in January just to give a sense of time scale there. The goal is to find open math problems that at least today no human knows how to solve.

Speaker 2: 01:33:26

We're still for practical reasons unfortunately constrained by automatic verifiability

Speaker 3: 01:33:32

Right.

Speaker 2: 01:33:33

Excluding lean Mhmm. Mhmm. Where we want every problem even though no human knows the answer. If an AI system were to come up with a purported answer, we could either verify it a 100% or at least get very strong computational evidence that they'd hit on the solution. And there's a number of practical concerns with that but it's it's meant to get around Yeah.

Speaker 2: 01:33:54

The problem of we don't know how difficult a problem is. Well, we do if people have tested it or we have some more information. Right. Right. I mean, I guess I'm sort of curious.

Speaker 2: 01:34:03

I'm curious for your thoughts on all sorts of aspects of this. One I might start with is how reliable might these difficulty estimates be and what sort of range are we gonna see there.

Speaker 1: 01:34:12

I I guess it is true that a lot of open problems are attention bottlenecked. Mhmm. So it could be the case that a open problem is is somehow not as hard. I think one useful thing to do is just to get mathematicians to say, oh, I think this hard. Right?

Speaker 1: 01:34:26

Because that's that's sort of prevents goalposts moving. Yep. One thing I like, by the way, about this kind of project is that, like, right now, you know, a lot of labs are devoting a lot of resources to to try to, you know, solve some problem so they can say we solve Right. This problem. It would be nice if some of the resources meant to went to problems that people actually care about.

Speaker 1: 01:34:44

Sure. So it's nice to incentivize that. You know, I think some of the labs are doing what I would consider to be real science, and some of them are primarily doing PR. Sure. I won't name names.

Speaker 1: 01:34:55

Sure. But, yeah, I think it's, you know, it would be it would be nice to to align the incentives so that, you know, more of them are doing real science.

Speaker 2: 01:35:04

It it also feels like this at least gives us a I mean, you said this a pre registration Right. Of Mhmm. You know, a population of problems that we think it's hard to tell with with a lab says, hey, our model solved this problem. Okay. How cherry picked was that?

Speaker 2: 01:35:20

Right. Right. Like, how many failed problem solves were there? And so if we get

Speaker 1: 01:35:24

something I'm I remember, so there was this paper that came out of OpenAI Mhmm. Early early acceleration Yeah. Yeah. Blah. I think they went through some conference proceedings and looked for problems.

Speaker 1: 01:35:35

Yep. And if I remember, they maybe looked at 10 different ones Sure. They gotta solve. So, yeah, I I think of that. That's, you know, it's a good way to

Speaker 2: 01:35:42

It's a

Speaker 1: 01:35:42

way to do explain that to, like, that they're doing that. It gives you some sense of was this problem an easy problem or is the model very capable. And I think, of course, if you are looking at a thousand problems and you solve one, looking at 10 problems, you solve one. Different. Yeah.

Speaker 1: 01:35:54

Yeah. I'm I I do think I should say I don't think that paper gave good evidence that there's acceleration happening. Sure. I'm just found ones. Yeah.

Speaker 2: 01:36:02

The, you know, the the denominator wasn't wasn't Yeah. So ridiculous. Yeah.

Speaker 1: 01:36:07

Yeah. There are against as with all these stories, there's some complicating factors Totally. Make make that make it, you know, there's some question as to what whether the problem solved precisely the one posed. Mhmm. Mhmm.

Speaker 1: 01:36:16

Whatever.

Speaker 2: 01:36:18

Yeah. So good. If if we can get a sense of mathematician saying, I think this would be interesting

Speaker 1: 01:36:23

Yeah.

Speaker 2: 01:36:23

How do you characterize that? Like, what what scale are we using here? Yeah. I know that's my job, but I'm curious with you.

Speaker 1: 01:36:29

Maybe someone yeah. You can get someone to say, is this interesting? Is it very interesting? Yeah. Extremely interesting?

Speaker 1: 01:36:34

You could get people to talk about what the consequences are. Right? Like so sometimes the problem is interesting because it is a source. Right? Mhmm.

Speaker 1: 01:36:41

It Yep. It implies some interesting stuff. Some some construction I mean, I think that the as I understand the benchmark proposing, you're looking for some kind of construction that can be verified.

Speaker 2: 01:36:52

Usually just because I wouldn't say we're looking for constructions per se. It just but we are looking for

Speaker 1: 01:36:58

sort of verifiable.

Speaker 2: 01:36:59

Because where we have this verifiability constraint, it's usually construct an object.

Speaker 1: 01:37:03

Yeah. So some some constructions of this nature, like, you know, for this some constructions relevant to matrix multiplication

Speaker 2: 01:37:09

Mhmm.

Speaker 1: 01:37:10

Would, you know, give you a faster matrix multiplication algorithm or something where they would have genuine consequences. Maybe people care about it for that reason. Yeah. I think some of these are still the tension bottleneck, but whatever. Sure.

Speaker 1: 01:37:20

Mhmm. The some some questions are sinks. Right? So meaning, maybe they this they they are supposed to serve as a benchmark for understanding. So I see.

Speaker 1: 01:37:31

It I I mentioned this example of this Euler sum of sum of power sum of Uh-huh. K powers conjecture, and they're in some sense because, like, a brute force search is very sort of is infeasible. That problem kind of benchmarks understanding because you've in order to solve it, you have to come up with some kind of clever search. So you've understood something about the search space.

Speaker 2: 01:37:53

Do you think would those sync benchmark kind of problems be fair measures of AI understanding? Give it like I'm hoping that the given that they're unsolved by humans, there's not like a super straightforward kind of cheat way to do it. You can't just apply the literature, find the obscure paper, maybe we will find a couple cases.

Speaker 1: 01:38:15

Yeah. I think it it depends a lot on the problem. You know, some of these things are just attention bottlenecks. Yep. Again, you a lot of a lot of kind of constructions people look for.

Speaker 1: 01:38:22

I think the state of the art is like someone ran Right. Naive search on their laptop over the weekend, so you find a better construction. Like, okay, that's great. Like, you tried one thing rather than zero things. You tried two things rather than one thing.

Speaker 2: 01:38:32

Right.

Speaker 1: 01:38:33

I I think that that clearly has value doing that, but it's I think it's a little hard to to get a sense of of the, you know, what is really showing, like, what kind of difficulty. Yeah. But there's some definitely some questions like we talked about the inner risk go up problem for the two group m 23, where I think, you know, again, if there was a Manhattan project to solve this, it would be solved. It's like it's it's definitely, I think, within reach. Yeah.

Speaker 1: 01:38:56

I mean, in some abstract sense. Sure. Totally. Not that I have an idea. But it's, know, people have definitely tried.

Speaker 2: 01:39:06

Yeah. Yeah.

Speaker 1: 01:39:06

So so you you the the you get some signal from that.

Speaker 2: 01:39:11

I I guess one question I'm sort of curious about if it's worth considering like interestingness or value as distinct from difficulty. And like what like I guess these

Speaker 1: 01:39:21

different dimensions. Yeah.

Speaker 2: 01:39:23

I suppose they correlate because people try the interesting problems. So the interesting problems that remain unsolved happen to be hard. Collider or whatever whatever the causal term is. But like, yeah, how tightly is this correlated? And Yeah.

Speaker 1: 01:39:37

I mean, I think you've explained the source of

Speaker 2: 01:39:39

I see.

Speaker 1: 01:39:39

Correlation between these two things. There there are some fun examples where you see see you can see something like this. So so there was a recent polymath project run by Terry Tau, this equation of theories project

Speaker 2: 01:39:52

I see it.

Speaker 1: 01:39:52

Where they were studying axioms for multiplication. So they're studying small magmas, whatever

Speaker 3: 01:40:00

that means. So

Speaker 1: 01:40:00

so you look at all possible axioms for multiplication rule, you know, which ones imply which other one. Mhmm. So, you know, maybe an example is x times y equals y times x. Sure. That's multiplic that's commutativity.

Speaker 1: 01:40:10

Yeah. You can ask, well, what other axioms of multiplication follow from that? Uh-huh. And so they they looked at all them all of them up to some fixed length, and they asked which ones implied each other, and they've now resolved maybe over 22,000,000 or something's Uh-huh. Computations have now resolved all but a few of the possible applications.

Speaker 1: 01:40:27

And so the vast majority of these were able to be solved by automated techniques. Yep. So I I think to some extent with LMUs Oh, interesting. Other also other Sure. Primarily other computer software.

Speaker 1: 01:40:39

I mean, you're trying to manipulate some Right. Simple

Speaker 2: 01:40:40

It's old fashioned AI. Yeah. Good candidate.

Speaker 1: 01:40:44

And then some of them required really nontrivial Mhmm. Ideas. Yep. So and I I think this sort of led to some some, I think, reasonably interesting mathematics. Sure.

Speaker 1: 01:40:55

Well, it's, of course, of this shape you're you're getting at.

Speaker 2: 01:41:00

Like, you

Speaker 1: 01:41:01

know, you if if the the hard problems will also be the interesting ones. Right. Right. Right. Well, you the first thing you try doesn't work and the second thing you try doesn't work.

Speaker 1: 01:41:07

Eventually, you

Speaker 2: 01:41:08

get There's something new. Technique. Yeah.

Speaker 1: 01:41:09

Yeah. So I think, for example, there is this some notion of something called magma homology invented to Interesting. To resolve certain of the implications, which I to me, seemed like some an interesting Yeah. Seemed need.

Speaker 2: 01:41:20

Yeah. Totally.

Speaker 1: 01:41:22

There's a sort of funny thing here, which is I think if you were to ask before this project, if you were to ask, like, the median mathematician, is this an interesting project? You know, I don't know. But I think posteriorly, it clearly is interesting in part because there ended up being some difficult implications and then resolving them required no ideas. So, I mean, my my personal feeling is actually if you go far enough in any direction, you'll get something both interesting and difficult. Yeah.

Speaker 1: 01:41:45

And sometimes it's difficult because it's interesting Yeah. As you said. So you're like, you kind of you work until it's because it's interesting you work until you get something hard. Right. But sometimes it's interesting because it's difficult.

Speaker 1: 01:41:55

Right?

Speaker 2: 01:41:55

Yeah. Yeah. It's a Why is this Yeah. Resisting our standard techniques or whatever?

Speaker 1: 01:41:59

There's some interesting phenomenon here like maybe, you know, a doesn't imply b, but it's close to b in some way. There's some obstruction to showing it's Yeah. We we touched on this a

Speaker 2: 01:42:08

little before, but the the goalpost moving is like an interesting thing to

Speaker 1: 01:42:12

Yeah.

Speaker 2: 01:42:13

You know, to to grapple with. There are some cases where it sounds like, look, you didn't expect that there was a boring way to solve this problem but looking at the AI system solution, no new ideas in sight, not within a mile. It's a grind maybe that the AI system went through. It totally proves it. It's a proof good for it.

Speaker 2: 01:42:34

Resolved a question we were curious about but like doesn't seem relevant for capabilities forecasting Right. Which is in the back of our heads like the big Mhmm. The big bogey. Any ways we can get ahead of that? Like what what are the, you know, what's the way to do this sort of judge post hoc judgment responsibly?

Speaker 2: 01:42:51

Do you have any thoughts on that?

Speaker 1: 01:42:53

Yeah. No. I mean, I do think I wanna argue that if you're solving an an open problem that is not, like, not, like, super attention bottleneck Yeah. Like and the proof is a grind or whatever, that's okay. Yeah.

Speaker 1: 01:43:11

I think that I think that's I think you shouldn't necessarily say, like, this is this is you know, this doesn't demonstrate anything. Yeah. Like, the ability to grind is a valuable skill as mathematician. Sure. Mhmm.

Speaker 1: 01:43:24

Yeah. But I yeah. I think you can you can post hoc try to care categorize the Sure. The solution. But, yeah, I I actually think solving a hard problem via grind, that's that's signal.

Speaker 2: 01:43:36

And mathematicians would do would do this if they could. Right?

Speaker 1: 01:43:39

Right. Like, it's not Should we say, like, the proof of the four color theorem? Well, like like, nobody cares because it's but actually, some people do say this. Yeah. They're wrong.

Speaker 2: 01:43:47

That's interesting.

Speaker 1: 01:43:48

Yeah. And I mean, I do think somehow there's a a long tradition in mathematics of gold post moving. Like, if you think about a lot of the the great mathematicians of the eighteenth or nineteenth century were great calculators. Yeah. And, you know, a lot of what they do now, any, you know, any eighth grader can do with their TI 84.

Speaker 1: 01:44:04

Yeah. And so now we kind of think of this kind of, you know, tabulation and calculation is is not as interesting. I think we're right to think Sure. Right? We we are a lot we are tool users.

Speaker 1: 01:44:14

We're allowed to use tools.

Speaker 2: 01:44:15

Yeah. I I guess the the the question is it's fair play for doing interesting math.

Speaker 1: 01:44:21

Yeah. Capability.

Speaker 2: 01:44:23

But for capability forecasting is is sort of the question. Yeah. If we if we look at this and like, it's it's doing the same thing. The p six on the 2024 IMO which Alpha Proof got like, it's shocking what a boring proof Yeah. It looks like.

Speaker 2: 01:44:36

Like, it's just case

Speaker 1: 01:44:36

I think sometimes, you know, sometimes you don't know a problem is easy until you solve it. Sure. That that happens. Yeah. And it happens for human mathematicians too.

Speaker 1: 01:44:44

Mean, you know, I I, you know, I mentioned earlier, like, last year, Aaron Landsman and I solved a forty year old open problem. Like, we didn't publish that in Annals. And Right. Because the solution was not Was not that interesting. Yeah.

Speaker 1: 01:44:56

If that happens. And, yeah. So so I think, yeah. Maybe maybe the thing to do is look at it and say, okay, we have to have some principled way of like deciding whether there's a really new idea here. Post mortem rubric.

Speaker 1: 01:45:08

Yeah. You can wait five years and see like how many, you know, how many new results are proven using these ideas like

Speaker 2: 01:45:14

That's know,

Speaker 1: 01:45:14

that's maybe an example. We talked earlier about the kayak contraption over finite fields. This introduced the polynomial method which has been hugely influential and I think so that paper was published I think in Annals and that's Yeah. High very well justified.

Speaker 2: 01:45:26

Yeah. Yeah.

Speaker 1: 01:45:26

Yeah. Earned it. By the, you know, post hoc, like, by the

Speaker 2: 01:45:30

Proprivability on top

Speaker 1: 01:45:31

of it. Exactly. Right. Whereas, you know, think a grind solution to IMO p six is, you know, nothing is gonna come up out of that idea. Right.

Speaker 2: 01:45:39

Mhmm. The the the difficulty measure here that we've sort of been thinking of that I I've been thinking of is something like, we talked about this already, how many how many mathematicians for how long? Yeah. Maybe some notion of seniority of, you know, like a fit of mathematician for the area.

Speaker 1: 01:45:58

Uh-huh.

Speaker 2: 01:45:58

Uh-huh. Like if I if I said, know, one or two you know, early career mathematicians tried to solve this problem and failed. Maybe they they've proven that like whatever. They they've got their PhDs. They've published some papers in the journals in their field, but not annals or something like that.

Speaker 2: 01:46:15

It like how big a step forward is an AI solving a problem that escaped such a team, small team or individual

Speaker 1: 01:46:24

Yeah.

Speaker 2: 01:46:24

From where we are now. Like, is that a is that

Speaker 1: 01:46:27

That's a good question. I mean, I think there's I think it's sort of unclear to me, like, how much low hanging fruit there is

Speaker 2: 01:46:32

Yeah.

Speaker 1: 01:46:32

This picture in in mathematics. Like, I think it's quite possible that there's a huge amount. Mhmm. Maybe the fact that AI systems have not already started to resolve interesting open questions. Mhmm.

Speaker 1: 01:46:43

So it may

Speaker 2: 01:46:44

Of certain level of interesting

Speaker 1: 01:46:45

level is some mild evidence against this. Mhmm. But I I think it's, you know, in some sense every question is attention bottleneck. Right? Like Sure.

Speaker 1: 01:46:53

Especially in, you know

Speaker 2: 01:46:54

Manhattan problem, anything, and it gets you know, most of them get solved.

Speaker 1: 01:46:57

Yeah. I I think this is quite possible, actually. In that case, that in that world, if that's correct, then I expect we'll see a huge, huge advance from from AI systems. And, also, somehow, this benchmarking will be easy. Right?

Speaker 1: 01:47:11

Because you'll you'll put any open problem on it, and then it will eventually be solved. And that will that will be evidence that lots and lots of problems at that level will soon be solved. On the other hand, I think maybe there's some mild evidence that that that not everything is is attention bottled. Not that much low hanging fruit, like, that when people actually work on stuff, it's either it either gets solved towards it's hard.

Speaker 2: 01:47:37

What what do you have in mind for for why it feels that way? Might feel that way?

Speaker 1: 01:47:41

I just have in mind like the fact that it's sort of we talked about a couple examples

Speaker 2: 01:47:45

Yeah.

Speaker 1: 01:47:45

Like, well, no open conjectures with very short proofs. Yep. There's very few

Speaker 2: 01:47:48

so many. I see.

Speaker 1: 01:47:49

I think I think you would sort of expect in a world where the loop we were really bad at at picking up low hanging fruit, you would expect to see some evidence that

Speaker 2: 01:47:57

we do sometimes Sometimes. Yeah.

Speaker 1: 01:47:58

People still It does it does absolutely happen, but it's pretty rare,

Speaker 2: 01:48:01

I think. Compared to other Yeah.

Speaker 1: 01:48:02

I think I think typically, you know, a a sort of resolution of a important conjecture doesn't just introduce, you know, a short new idea,

Speaker 2: 01:48:11

but Yeah.

Speaker 1: 01:48:12

Many ideas or relies on lots of other developments in the field. And, you know, you kind of can visibly look at the the the resolution and see some advances Yeah. Required. Yeah. I do think, you know, we're not always that good at at seeing it.

Speaker 1: 01:48:28

Like, sometimes, you know, there's results that are, you know, sometimes you prove a result because the thing the last thing you needed was just proven recently and Yeah. Finally slotted into place. Sometimes you see results where, you know, something was proven and it had been in the literature for twenty years.

Speaker 2: 01:48:43

And then someone realized the consequence.

Speaker 1: 01:48:45

Yeah. Sometimes I'm not. I think all of us have examples of of papers that come out and we're like, oh, I knew the main idea. Yeah. If I just realized, you know, so, yeah.

Speaker 1: 01:48:53

I there's a there's a one of my favorite papers actually by by Nises and Schindler. The main idea is like, in a math overflow answer that someone wrote in response to a question I asked from a few years earlier, and I, like, was kicking myself.

Speaker 2: 01:49:06

Yeah. Yeah. Yeah. That's fun.

Speaker 1: 01:49:08

Of course, it's, you know so it it happens. It does happen a lot. Like, there clearly is some low hanging fruit of this nature, but, yeah, it's still, you know, I I think there's weak evidence that it's it's not not everything.

Speaker 2: 01:49:22

Good. For this benchmark, we have this kind of binding, annoying, like unnatural constraint

Speaker 1: 01:49:29

Right.

Speaker 2: 01:49:29

Of automatic verifiability. Meaning we want a computer program to tell tell you the whether the answer is right or not. Right. How how bad is that? And I think maybe one angle, though answer however you want, but one angle is like, how correlated are the problems that admit such Yeah.

Speaker 2: 01:49:49

Verification with, like, other dimensions?

Speaker 1: 01:49:52

Yeah. I mean, in principle, it's not not a constraint. Right? Any any any mathematical construction, you know okay. Yeah.

Speaker 1: 01:50:00

Modular issues of incompleteness, whatever. Like, you could imagine a a company with a proof that you verified, but I I think in in practice, it's a real constraint because you, you know, have limited resources to get people to write code that verifies Right.

Speaker 2: 01:50:12

Very much.

Speaker 1: 01:50:12

Software, and maybe you don't wanna ask the the model for itself to write.

Speaker 2: 01:50:16

The that's those are exactly our constraints. So we're limited to more like regular old computer program can verify.

Speaker 1: 01:50:22

Yeah. So so one constraint this this gives you is like mean, I there are a lot of fields and and interesting questions which are just simply not of this nature. So Yeah. I think I think maybe you you showed me a list of problems.

Speaker 2: 01:50:34

Yeah.

Speaker 1: 01:50:34

Very maybe one of them was in algebraic geometry. I think, you know, there's there's certainly I think areas of algebraic geometry were just simply no question Not like a has this Yeah. Nature. And same with areas of number theory, although there is a lot of beautiful computational number theory where one has a chance. I think the primary thing area where this limits you is interest.

Speaker 2: 01:51:01

Mhmm.

Speaker 1: 01:51:01

Right? So there are things like the problem where one wants to make a verifiable construction and Mhmm. That people are very interested in it. With rare exceptions, you wanna make an infinite series of construction. Yes.

Speaker 1: 01:51:14

And those are, you know, of course, it's much harder to verify an infinite sequence.

Speaker 2: 01:51:19

In in some cases, it seems like well, I'm curious on vibes. One problem class I've gotten excited about is merely asking for a sort of do a zero knowledge proof of maybe you have an infinite sequence Mhmm. Where it's like, I'm not gonna be able to that you've got something here, but do do it for 297. Sure. And like that that's

Speaker 1: 01:51:39

I think that's a great way to do it. I mean, but the again, the issue is like you have a very run very quickly run into practical Yeah. Experience unless you can verify extremely rapidly. Yeah. Actually, what all you're asking is for the first five Yes.

Speaker 1: 01:51:50

Yeah. Exactly. Even for the inverse Galois problem, you know, you can ask for, you know, there's infinite sequences of simple groups where you ask for, but, you know, probably beyond the first three or four You're you're not practical. Yeah. So, yeah, it's I think there's very serious constraints here.

Speaker 1: 01:52:05

Yep. And so yeah. So right. What what I think one hopes with a with a benchmark like this is that the ability to produce an example is actually a proxy for some kind of understanding or

Speaker 2: 01:52:15

Yep.

Speaker 1: 01:52:16

If there's some clever search happening. And I I think it's often unclear whether whether that's the case. You know, sometime, I think there there are some cases where we have evidence again that humans have tried and kind of failed that like we sort of know we're missing something. Or perhaps where humans have succeeded at similar problems Yeah.

Speaker 2: 01:52:37

There's been some insight Yeah. Otherwise nothing's come out.

Speaker 1: 01:52:40

Yeah. Yeah. Like I said, there there are definitely examples of problems where every new construction requires a beautiful new idea and Yeah. Yeah. If you can get another example you hope it

Speaker 2: 01:52:47

new idea behind it. Yeah. Exactly. I sent you a list. Was curious.

Speaker 2: 01:52:52

I mean, I think we're aiming to span a difficulty range. We'd love this benchmark to be continuous Right. So we can track progress like that. I guess our risks on the easier end are, oh the postdoc who wrote this paper was having a bad day when they considered this piece Right. And actually that's no harder than an IMO p one.

Speaker 1: 01:53:12

Mhmm.

Speaker 2: 01:53:13

In some ways this sounds whatever we'll defeat this with statistics Like Right. It's not so bad to Yeah. Find some of these. I'm also curious on the difficult end because the math that is most interesting, most difficult often has more abstraction to it. Right.

Speaker 2: 01:53:26

Often, I don't know of the problem

Speaker 1: 01:53:28

might be a Marvell's paradox issue.

Speaker 2: 01:53:30

I So is it yeah. So sorry. Say more?

Speaker 1: 01:53:32

Oh, I I just mean like Yeah. Right. So one way math can be difficult is it requires high reasoning effort or whatever. Yeah. Another way is that like you need to learn a lot of what a lot of words mean.

Speaker 1: 01:53:40

Sure. You know, Yeah. Like hold a huge kind of Yeah. Yeah. Yeah.

Speaker 1: 01:53:44

Theory in your head and Sure. Yeah. And so I think it's possible that like, you know, algebraic geometry maybe has a reputation of being very hard. Yeah. I think it's quite possible that it's that that's just due to the fact that that nothing intrinsic to the subject is that humans are kind of bad at it and very few humans are

Speaker 2: 01:53:59

Yeah.

Speaker 1: 01:53:59

Yeah. In it at all. Gotcha. Gotcha.

Speaker 2: 01:54:02

Yeah. Right. Attention bottlenecks strike at all points Yeah. On the space. Of the problems they send you, was curious like what would did you have any gut of like, oh, this is like actually much harder, much more interesting.

Speaker 1: 01:54:15

So there are few yeah. So one of the problems was the InverScalable problem for m 23, so that I think I can certify as a problem I would be very excited to see solved even though I think it's, you know, it's sort of within reach of a Manhattan project. Sure. It would be a big deal if

Speaker 2: 01:54:27

Yeah.

Speaker 1: 01:54:27

And, you know, I would be very excited if humans solved it. I'm very excited if AI solved it. There were some problems about irrationality. Mhmm. So in in, I guess, the late seventies maybe That's right.

Speaker 1: 01:54:39

Approved the irrationality of of zeta of three, so the sum of the reciprocals of the cubes of positive integers. And it was sort of a magical proof. Like, he kind of he he gave he explained the proof that I think this conference is a journey arithmetic and no one could believe it. And then quite short actually.

Speaker 2: 01:54:55

Yeah. Yeah.

Speaker 1: 01:54:55

I went home and checked it, and it was amazing. And people later realized it was connected to some deep theory. So the the theory of physical g functions. Uh-huh. And so as I understand, the problem is to sort of find sequence those some kind of sequences of integers or Mhmm.

Speaker 1: 01:55:09

Another way of saying that power series satisfies properties that would let you run this for other interesting constants.

Speaker 2: 01:55:15

That's right. Re reapply this Yeah. Style of proof.

Speaker 1: 01:55:17

And there's certainly I I think that's very interesting in the sense that there's certainly been some effort devoted to that. So did a lot of Zagier, who's a very, very, very serious mathematician. Yeah. Yeah. Did a lot of computations, like trying to find analogous sequences or power series with some mild, but mild success.

Speaker 2: 01:55:35

Yeah. Yeah.

Speaker 1: 01:55:36

Maybe there were six or seven examples he found.

Speaker 2: 01:55:38

But none of them, like, things you would have picked ahead of time as, like, the the most important or the

Speaker 1: 01:55:44

most some ideas. And then there were there were sort of a few other sort of there's been very recent progress in this area Yeah. Like Kalagari, Dimitrov, and Tang where they, instead of finding new sequences, they found new ways to apply the same general method

Speaker 3: 01:55:59

Sure.

Speaker 1: 01:55:59

With, you know, many many meaningful innovation Yeah.

Speaker 2: 01:56:02

Yeah.

Speaker 1: 01:56:02

To let you do this with other sort of broader class of sequences. Right. Although, maybe it's not, I think, it's not exactly clear to me exactly what collection of of properties one needs I see. To run this argument. It seems like there's a little bit of artisanal work that needs to be

Speaker 2: 01:56:17

done in any case.

Speaker 1: 01:56:19

Yeah. So I I think this can be certified as hard and that a lot of people have thought about it. Yeah. Yeah. Think there's some kind of theoretical reasons to suspect that, like, at least the techniques that have pre used Mhmm.

Speaker 1: 01:56:32

Like, there's limits in in terms of how far they can be generalized.

Speaker 2: 01:56:35

Right. That I mean, is the the another edge of a you know, another challenge we face with this benchmark is we want the ways we're posing the problems to have to be solvable.

Speaker 1: 01:56:46

Right. You're trying

Speaker 2: 01:56:47

to tell us You're trying to

Speaker 1: 01:56:48

pose true you're trying to ask questions which have an

Speaker 2: 01:56:50

answer Yeah. Exactly.

Speaker 1: 01:56:51

Through conjectures. That's a very already a very hard thing to do is to to figure out what's true, let alone prove it.

Speaker 2: 01:56:57

Right. I I'm curious like like, do you have a sense when mathematician if one mathematician familiar with an area otherwise respected by their colleagues tells us that they are 80% sure that this problem resolves in this direction

Speaker 1: 01:57:10

Mhmm.

Speaker 2: 01:57:11

Though they don't know how to, you know, they can't produce the construction themselves. How much credence do you give that?

Speaker 1: 01:57:17

I I would say it's slightly better than random chance, but not not yeah. People change views all the time. Yeah. There are a lot of examples. I think one one of the examples of the problems on your list of finding an elliptic curve of rank at least 30.

Speaker 3: 01:57:28

Mhmm.

Speaker 1: 01:57:29

For a long time, everyone in the field believed that the ranks of elliptic curves were unbounded. I think now maybe the majority of people believe that the the ranks are bounded. Oh, really? Yeah. So there's a lot of and I haven't done a poll, but

Speaker 2: 01:57:41

Yeah. Yeah. There's just

Speaker 1: 01:57:42

a fair amount of recent heuristic work suggesting that, although I think some people have doubts. Fascinating. Yeah. So I I I think it's, you know, people's even people's views of the truth Yep. Just guesses as to the truth you follow.

Speaker 1: 01:57:55

If you guys heard the Hodge conjecture, I think probably most maybe most algebraic elders believe it, but there's a lot a lot do not.

Speaker 2: 01:58:03

And then some some cases, get proofs that some construction exists, but no one's been able to. Right. The those are, like, the ideal case, but perhaps a minority.

Speaker 1: 01:58:11

Maybe you can

Speaker 2: 01:58:12

Also, maybe less interesting. I don't know.

Speaker 1: 01:58:14

No. I mean, I think I actually quite like that kind of thing. Like, for example, there's there's bounds on Ramsey numbers Yeah. From the probabilistic methods. You can ask for explicit constructions.

Speaker 1: 01:58:22

I think that's a very interesting question, although it's far from my area. Yeah. Yeah. And one of the problems I'm working on now is like Sarah asked for some examples of some objects that have been constructed via very inexplicit techniques. I see.

Speaker 1: 01:58:34

And I've been working to find explicit constructions. I think that's a very fun thing that I mean, it's very different to do something explicitly and inexplicitly. Like, it gives you a lot of insight. So I I actually am very in favor of that kind of problem.

Speaker 3: 01:58:47

What are the heuristic arguments or good examples of these heuristic arguments that make people updates without actually having the full proof or, like, a really concrete Yeah.

Speaker 1: 01:58:54

That's a good question. So I'm not there's a long history in number theory of of making random models of number theoretic objects. So for example, you could think maybe the prime numbers are distributed to, a a random collection of integers with which have some you list all the properties you can think of that the prime number satisfy, and then you try to you try to make a random sequence with those properties and then ask, well, what satisfied what what properties does that random sequence satisfy with probability one? And then you say, oh, maybe the same things are true of primes. You might try to make a you list all the properties you can think of of an elliptic curve, and then you try to make a random model of an object with those properties.

Speaker 1: 01:59:30

And you ask, what what what does it have? Maybe a very basic situation is maybe you have a n by m matrix that is given to you in some a specific n by m matrix that comes from some geometric or arithmetic situation. And you say, well, maybe it should behave like a random n by n matrix. And so you for example, there's beautiful work by Melanie Metcicht Wooden Collaborators studying random matrices of integers, and they use that to make a numb lot number of beautiful number theoretic predictions about about, you know, elliptic curves and related objects. And so, yeah, I I think that's the kind of heuristic people have in mind.

Speaker 1: 02:00:05

So there's a theorem behind it. You know, the theorem says a random object of this satisfying properties x, y, and z has such and such properties. So it has, you know, property x with property one. With probability one, it has, you know, the many variant of it has the following distribution. And then if you are getting I don't know.

Speaker 1: 02:00:20

Maybe you have some way of producing a a billion group out of a number field. So classic example is the class group of a number field. Well, you say maybe that behaves like a a random abelian group generated according to some distribution. Mhmm. And then one expects that if you just list all number fields in order sorry.

Speaker 1: 02:00:38

Number fields are are are objects where you can add and multiply and divide, like the rational numbers. So they're finite extensions for rational numbers for those who know

Speaker 2: 02:00:47

We we've long left the realm of defining and most of the things

Speaker 1: 02:00:50

we've Yeah. So you you expect their class groups behave if you just list them in some natural order, like, they're you the the proportion of them satisfying property x, or z behaves like a random group satisfying all the properties you can think of, and that's what the Cohen Lenzler heuristics.

Speaker 2: 02:01:05

Mhmm.

Speaker 1: 02:01:05

Yeah. So that's one of the the driving forces behind a lot of a lot of number theory or arithmetic statistics right now. And so yeah. So the the heuristic is you the heuristic work is you come up with a random model and prove some properties it has, and then you try to prove that the actual kind of non random objects you care about have the same distribution or the same behavior.

Speaker 2: 02:01:24

If an AI system wiped the floor with this benchmark Mhmm. With problems of the kind that, you know, we've discussed or you've seen I I've shared with you. What what sort of world do you think we're living in?

Speaker 1: 02:01:37

I would be very excited. First of all, I don't know Presumably, you would oops. There's lot of Yeah. We presumably, it did define these constructions would be interesting at least some high proportion of the time.

Speaker 2: 02:01:47

Yep.

Speaker 1: 02:01:49

I think there's a good question as to how we should expect it to correlate with other capabilities. It's not a 100% clear to me. So I'm not you know, there with these sorts of constructions, there's kind of a clear reward signal. You can kind of imagine you can try to kind of imagine trying to directly train a model to target this kind of construction. And I it's unclear to me whether that reward signal would generalize to, like, prove proofs, for example.

Speaker 1: 02:02:13

Sure. I mean, of course, one can also verify proofs and

Speaker 2: 02:02:17

Right. Right. But maybe you have to wait for that machinery to be better set

Speaker 1: 02:02:20

up before you Yeah. Maybe it's a different on it. Yeah. I mean, I think in general, a big open question for me about the future of math capabilities is just how much generalization is happening. Yeah.

Speaker 1: 02:02:28

Right? Like, a world where, you know, to get every time a new mathematical object is discovered or invented, depending on your philosophy, you need to train a new model from scratch is a very different one

Speaker 2: 02:02:41

Yeah.

Speaker 1: 02:02:42

From a world where the models can

Speaker 2: 02:02:44

Pick it up and run with it.

Speaker 1: 02:02:45

Yeah. Pick it up or or even just one where, like, general knowledge and capability in algebraic geometry generalizes to Yeah. Algebraic geometry plus epsilon. Yeah. Right?

Speaker 1: 02:02:54

Like, we're constantly I think I can think of kind of 10 new versions of of what a space is from the last ten years or something like that. And, you know, not even just continue learning, like, if if each of those require you know, if you can imagine a model that learns what those are, doesn't learn, you know, capabilities and have the capacity to play with them the same way you would have. I think yeah. So I mean, here we're talking about a very special case where we have a model that's very good at making constructions that are verifiable. It's I think it's an open question how much that correlates with sort of broader capability.

Speaker 1: 02:03:29

It would clearly be like an epochal development though.

Speaker 2: 02:03:32

So For yeah. For for math Yeah. Certainly. And then for broader capabilities, it would just depend on what that system looks like in terms of its generalization. Right.

Speaker 2: 02:03:40

Where it could be extremely narrow. I mean, we've we've, you know, hill climbed our way to game playing. Right. You know, and if it's Yeah. Just sort

Speaker 1: 02:03:46

We already I mean, we already see some constructions like this from Alpha Evolve. And Alpha Evolve is not doing interesting proofs.

Speaker 2: 02:03:51

Right. Doing interesting constructions. And actually just so you're on the you're on the record on Twitter with this in a thread with some epoch folks, I think, on the interestingness nonetheless of the alpha evolved constructions is limited. I think you're

Speaker 1: 02:04:05

Yeah. I I would say it's it's primarily the specific constructions that are primarily interesting because they're, you know, because it's an automated system doing

Speaker 2: 02:04:13

it. Yeah.

Speaker 1: 02:04:13

Yeah. But, yeah, I I one could imagine a future iteration where the constructions themselves I mean, I think that's what your benchmark is trying to Yeah.

Speaker 2: 02:04:19

That's exactly right.

Speaker 1: 02:04:20

Where the constructions themselves are of of substantial independent interest.

Speaker 2: 02:04:24

And to I mean, the last real thing I have on my mind on this is generalization from not not just two other fields of math, but it's outside of your direct direct area of expertise, but scientific fields, r and d fields, like in a world where AI systems are regularly resolving interesting math problems and let's say not through hyper specialized methods Right. Like not a system, not alpha proof Right.

Speaker 1: 02:04:52

Or Alpha evolve.

Speaker 2: 02:04:53

Yeah. Not not alpha. Yeah. Yeah. Not alpha evolve.

Speaker 2: 02:04:56

What do you think that looks like for the sciences, for AI, r and d, like

Speaker 1: 02:05:02

I mean, my view is that right now, the primary obstruction is to AI doing autonomous high quality math research are the same as the obstructions to do any kind of economically valuable labor. Mhmm. Like, you know, sometimes you need to be creative. You need to adapt to new techniques. You need to you need to learn something.

Speaker 1: 02:05:18

You need to work on something for a long time. Those are all things that current systems have problems with. Yep. I I suspect, first of all, that if those problems are solved, the models will become very good at math research. Yep.

Speaker 1: 02:05:29

And, well, that suggests that I I also think those solving those problems is more or less required for the models

Speaker 2: 02:05:35

to become very good at math. I I I think the yeah. I mean, I think one interesting thing that I was talking about with some someone at a foundation who may or may not fund this benchmark. Mhmm. What was what are the chances that you know, there's an extra ingredient that makes math difficult so that this ends up being one of the last things

Speaker 3: 02:05:54

Uh-huh.

Speaker 2: 02:05:55

To fall on the path to more societally transformative.

Speaker 1: 02:05:58

I think that's pretty unlikely. I mean, that said, you know, I think it's very hard to understand what's required for good math research. Sure. You know, it's some kind of introspection. Mhmm.

Speaker 1: 02:06:08

I think that, you know, there's evidence that it involves things like creativity and Mhmm. You know, working hard for a long time and that kind of stuff. It's possible that's not true. And in that case, I think, you know, math will, you know, might might have you know, there might be more progress before we see other areas. I think it's very unlikely that there's an extra ingredient to to mathematics.

Speaker 1: 02:06:33

It seems to me like this is just me observing my daily life. Yeah. The practice of me doing my job is not that different from the practice of other people doing their jobs. I mean, okay. There is a lot more like lying back in my couch and staring at the wall and just kind of thinking about stuff.

Speaker 1: 02:06:47

But like, I I think the actual ingredients are pretty much the same as any other economically valuable labor. Very good.

Speaker 2: 02:06:53

Are there other questions we should have asked you or things you wanted to talk

Speaker 3: 02:06:56

about get onto the record?

Speaker 1: 02:06:58

Yeah. Maybe one thing I just wanna Yeah. We just touched on a little bit, but I just want I I wanna talk about it a bit more is like, I think a big question here, and I I should just say, I think I I learned this point of view from Tamay is is about the marginal cost of of of doing different mathematical activities. Uh-huh. So I think right now, at least where I expect AI to have a significant impact is that the marginal cost of, like, trying something is getting very small.

Speaker 1: 02:07:25

And that has an impact, like, large part because, like, you know, sometimes you make a conjecture in order to check it, you should probably, you know, have run some computer programs Sure. To, like, do some computation and, like, you were too lazy to do that. And a lot of conjectures have this flavor and, like, you don't need a very capable system to resolve that kind of conjecture. We're already seeing a lot of these things fall. And, yeah, I think that's a huge deal.

Speaker 2: 02:07:50

What what is what sort of it might not be the right concept. What sort of acceleration to math progress does that give you or does it open up new angles or like what what do you expect?

Speaker 1: 02:07:59

Yeah. So I I don't I don't think of it as much about acceleration. It's just like, you know, there's some tax on progress, which is like, you have to try something and bring the cost of that down as meaningful even if I think the primary bottlenecks are elsewhere. Like, sure. Yeah.

Speaker 1: 02:08:20

Like, some you have to try something that's like a that is a bottleneck for sure. It's a friction. Yep. I think the primary obstructions to progress are, like, you have to have a good idea. Yep.

Speaker 1: 02:08:28

And, you know, most people have a couple of those a year. Yep. Not clear to me how much helps with that, But, you know, trying one sometimes you don't need a good idea. Yeah. And knowing when you don't need a good idea is is very valuable.

Speaker 1: 02:08:41

Like, sometimes you just need to sit down and and grind something out. Yeah. So I I I think, you know, even holding capabilities we should expect a lot of frictions to disappear. And that seems to me, like like, currently to be the the area where I the most progress.

Speaker 2: 02:08:59

Just like, even though you don't expect that to cause a discontinuity in math progress, that's the framing you find most helpful Yeah. For what AI is bringing to math. Yeah.

Speaker 1: 02:09:11

Right now. Yeah.

Speaker 2: 02:09:12

Right now.

Speaker 1: 02:09:12

Yeah. Mhmm. Cool. Yeah. I think in terms of acceleration or whatever, I would really love to see some kind of way of operationalizing that and actually measuring, like, are we actually seeing acceleration?

Speaker 1: 02:09:26

You know, like, have we ever seen acceleration is kind of an open question to me. Totally. Like, has the productivity per person of of of mathematical work increased time? It's just not clear to me even though we have all these sort of new tools. Like, you know, if you look at like, how do you operationalize that question?

Speaker 1: 02:09:44

Yeah. You can look at citations, I guess. Like, it's not No. It's it's not it's a sort of obviously very bad proxy. It also seems to correlate relatively well with population.

Speaker 2: 02:09:53

Before we wrap up, any any things you're looking for in the next couple of months, I don't know, that would be interesting? Anything short term that

Speaker 1: 02:10:01

we Yeah. I've not what I would say is so I think that, you know, August, maybe I I said something like it's weird that we haven't seen a lot of mildly interesting conjectures resolved by AI given the capabilities of current systems. So mildly interesting, but attention bottleneck things. I guess we're now starting to see those arguably. So, yeah, I I would expect, you know, within the next year, we'll see a lot of those.

Speaker 1: 02:10:23

So a lot of, you know, problems that no one has really thought much about, just trying to throw away questions, but things that people found interesting enough to write down Yeah. Resolved honestly. That seems quite likely.

Speaker 2: 02:10:33

Alright. We will have plenty of opportunities for emergency podcasts in the future. Everyone follow Daniel on Twitter. It's, honestly a good a pretty good source as these things go of AI math news writ large for better or for worse. But, yeah, we appreciate it.

Speaker 2: 02:10:49

Thank you so much for talking with us.

Speaker 1: 02:10:50

Thank so much for the invitation. Yeah. It was great. Great to be here. Yes.

More episodes

Chapters

What is Epoch After Hours?