Steve Hsu is Professor of Theoretical Physics and Computational Mathematics, Science, and Engineering at Michigan State University. Join him for wide-ranging conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.
Welcome to Manifold. I am recording a special episode today in which it's mainly just me talking to you, but I have off camera here with me an old friend with whom I've discussed AI for 30 plus years probably and the main theme of this episode is going to be me recounting the travels that I've been taking in the late spring and early summer so far this year, all of which have been AI related.
Steve Hsu: And I'm going to get into really what you could consider an update on the current status of AI. The geographical locales that this narrative is going to be based in include Silicon Valley in San Francisco and Berkeley, Singapore, and the Philippines. And each of these play an important role in the narrative that I'm going to give you, or actually the status report that I'm going to give you on the state of AI.
So let's jump in. My friend Jim is the interlocutor. Jim, are you there?
Jim: I am here. How’s it going?
Steve Hsu: Great. All right. So feel free to jump in with any burning questions or clarifications both for yourself and for the audience.
Let me start by saying that I've spent quite a lot of time in San Francisco, Silicon Valley, and Berkeley in the last couple months.
And there, I would like to say there is a huge AI bubble developing. Bubbles could be meant in multiple ways. It could refer to a kind of isolated kind of unique viewpoint that's present mostly there and among people who are very online. Or it could mean a bubble as in a financial bubble, as in the valuations of all these AI companies, including NVIDIA, are way higher than they should be.
And so let me get into that. I've had conversations with people at all the major AI companies Google Meta, OpenAI, Anthropic, and of course the topic of discussion is when are we going to get to AGI. How powerful is the next generation of models post GPT 4 going to be? And related to that and this is something that impacts the part of the story where I'm across the Pacific in Singapore and Taiwan. When is someone going to start actually making serious money from AI?
Because I'm going to claim to you that at least as far as these language models go there isn't really serious money being made yet. Another way of saying no serious money being made yet is that the real impact on the economy is thus still de minimis. very much. So let me talk about the AI bubble in the Bay Area and every dinner, every meeting that I've had with people there, especially the ones who are in this race among these companies, and it is a flat out race.
So for those of you who are interested in AI safety and proceed cautiously or pausing AI forget about it. It's a full blown all out race between these companies.
And a central part of the discussion is a kind of scaling law. So physicists are familiar with scaling laws. In the case of AI, there's some empirical evidence that says that if you increase the size of your model, the amount of training you do with that model, or the amount of compute you expend in training it, and the amount of data that you train on.
If you're, if you scale all three of those parameters with certain relationships between them, you could be on a kind of optimal frontier. And by extrapolating the few empirical points we have so far on that scaling relationship, you might be able to predict how powerful The next generation of models will be and of course what you mean by how powerful is in terms of the metrics which have been used to measure the existing models and there's a whole array of different metrics that people subject every new large language model to that sort of determine how powerful it is relative to all its peers.
And so a core question in Silicon Valley and in that particular thought bubble is. Are we going to be able to stay on this scaling curve? And if we are, what's it going to take? And I would say there is really kind of maximal uncertainty right now about whether we are going to stay on that curve.
To put it very succinctly, one Google, longtime Google AI researcher that I talked to happens to have sort of roughly the same views that I have on this. And the way he characterized it was 70 percent chance between four and five, i. e. for the next generation of models that may be coming out in 2025. you will see a bending over of the curve. You will see a slowdown in the rate of progress. In other words, how much better will the model be as a function of the increase in those parameters which affect it. training data size model size, amount of compute expended.
So 70 percent chance, the rate of improvement, say from GPT 3 to GPT 4 will not be fully replicated in jumping from four to five. So 70 percent chance of that, or at least, you know, 50 percent or more. and then the remainder that will stay on the scaling curve, at least for one more generation. And, and who knows, maybe beyond that. Jim, I think you have a question.
Jim: Yeah, is this because there's a bottleneck in, like, maybe data size or hardware coming up? Or is this because we’ll actually be able to increase the data size and hardware and it just won’t give as good congruence?
Steve Hsu: I, I think the most plausible cause, so now if we're talking, if we were talking say a year, a year and a half from now and looking backwards, and suppose that we, we sort of fell off the scaling curve, i. e. the curve bent over a little bit, bent over a little bit, we underperformed the projection currently made from the scaling relationships.
I would say the most plausible reason is because in order to get, say, another 10x in training data, people have started to resort to either synthetic data, so data training data that's actually made up by models or starting to use lower quality data. So, if you've already crawled the web, and you've already sucked in all the text of every book that's been digitized that you can lay your hands on And, you know, you've started to use the archive and scholarly journals and stuff like that. At some point you're running out of tokens. And, you know, we're already at the multi tok multi trillion token level for training sets and to go another order of magnitude beyond that would be, I think, difficult. Now, there are more tokens available, so if you own Meta or you own Google and you have access to all the stuff that's ever been typed onto Facebook or into a search box on Google.
You may have more tokens, but the quality of those interactions or those strings of text may be lower than what was available for the first few trillion tokens. And so either for that reason or because synthetic data just is not as good I think that, that is the most plausible. way that the slowdown will happen.
I think that in terms of the money and energy and compute that's required to go to the next step. And if you have enough compute, then you can have a model size, which is big enough. Those resources are available. So I think if, if, if there is a problem, it's probably going to be a data limitation. It's probably not that the architecture is not good enough to, to, to continue to improve.
Jim: Got it. So data is limited.
Steve Hsu: Yeah, I think that's the case. Now, again, all of this is speculative and the thing that's reinforced here is that in talking to pretty much everybody in the field. So, people who are very, very senior at the top AI companies to engineers that are really working hard to get things to scale, to do all kinds of grindy work that isn't glamorous at all, but is really necessary. to keep improving these models or build the next generation model. I've talked to people across that whole spectrum. And there's a very broad divergence of views. I would say I didn't meet anybody who is 95 percent or more confident that scaling would continue. I did meet people who, as I said, have it as their most of their probability weight on scaling, kind of slowing down a little bit.
So that's kind of the range of what I encountered. and I think there's at least a 50 percent chance there'll be a little bit of a slowdown. Now, one point I wanted to make that I think people don't understand is if you imagine, like, what are the top researchers at open AI or anthropic or deep mind doing?
It's not all fun and games. The architecture itself has not changed that much. I mean, we're still using a kind of transformer architecture. And although there have been slight tweaks to the way the attention heads work, the way they implement the context window, the way they can have a stream of side data, which kind of goes along for the ride and can be accessed.
by the neural net, there are small deltas, but mostly we're using the same architecture. So what these people are doing is not some kind of deep mathematical work where they're trying to figure out what's a better way to structure this neural net. Mostly what they're doing is really hands on grindy stuff where they're trying to lay their hands on, you know, the right human evaluation for RLHF or cleaning the pre-training data.
Just stuff that really requires ingenuity to automate as much as. You can, but then you can't avoid having some human intervention where humans have to go in and do quality control on the stuff that you're using for your pre training or your post training. And so it's not actually as glamorous as you might think, it kind of boils down to grindy, difficult work and that anybody who's worked at a tech company in software realizes that eventually comes down to that, even though the original core idea of what your technology is based on may be very brilliant and elegant, in the end, there's a lot of grindy stuff.
I just want to emphasize that. I emphasize that a lot to our team at Superfocus, because a lot of what we have to do to apply these models, which I'll talk about in a minute, is kind of grindy, even though our original ideas are elegant. A lot of stuff we have to do is grindy and so everybody just has to get used to the grind.
So, now there are people who I think tend to not be involved in actually pushing the models to the next level, but they're just sort of observing very with a lot of interest from the sidelines, who like to speculate and they want to speculate about the exciting possibilities. And exciting possibilities are like fast takeoff, or we stay on the scaling curve for several orders of magnitude.
And if we stay on the scaling curve for several orders of magnitude, we end up spending, you know, billions, tens of billions, hundreds of billions of dollars in CapEx to build the computer to run, to do the training runs. It becomes very exciting, like type stuff where you're making trillion dollar bets.
And if you extrapolate along the curve, the evaluation metrics have the model becoming pretty much superhuman in every respect. And so at that point you could imagine that the AI is so good that it starts to, in some kind of recursive way, improve itself, or it is a better AI engineer. Then in a human AI engineer, you can make many instances of it and it starts to improve itself at a pace that we can't follow.
So those are all the exciting possibilities. And I'm not saying those exciting possibilities won't be realized. Do I think they could be realized in the next few years seems kind of implausible to me. Also I think Jan LeCun, who has been very critical of this notion that LL, that transformer architectures will be enough.
To get us to AGI. I think he has a point. I want to comment on that a little bit. I think people misinterpret what he says. He's not being negative about LLMs. He thinks LLMs are great. But he just thinks that they're not going to be everything. That some additional innovations at the architectural level will have to be made before we really reach full AGI.
And to me, that's also plausible. The way I would describe it is that, given LLMs, as good as what we have now, or maybe just one or two generations beyond, If I embed them in a larger software architecture with external modules, like an external memory module or some kind of planning or logical control module, that I could make something that from the outside really seems like an AGI, even though on the inside, maybe it's not, maybe it's a very, very advanced kind of language reasoning center connected to kind of old timey software.
That's providing an external memory and external goal orientation and, and, and planning. nevertheless, I think that kind of thing is achievable, but in order to achieve something that has long term goal orientation, a kind of attached memory function, all those things within one neural net structure, the whole thing being a neural net, not a neural net embedded into a more traditional software architecture, but a full blown neural net that has All of those subcomponents, I think there will have to be some architectural innovations, and I think that's what Yann LeCun is emphasizing, saying if you're a student, and your main goal is to get to AGI, just think of LLMs as an achievement that gave us a language center, a language part of the brain, but there's still some other things that we need.
And to get all of that integrated into one neural net requires some actual creativity and, and, you know, brilliant breakthroughs, not just us moving along the scaling curve with essentially the same architecture. Jim, you and I have talked about Jan. Do you think I'm being fair to Jan or overly generous to Jan?
Jim: Well, I think to me it seems kind of obvious. And actually this is another topic he was talking about on Twitter before. It seems kind of obvious to me that we are already feeding the neural nets a lot more data than you feed a baby. And so, there's something non-optimal going on in there’s a lot of room to improve that at this point.
Steve Hsu: Yeah.
Jim: But.
Steve Hsu: Well, I was going to say, I think that's a somewhat separate logical point because Imagine the thing that we have isn't able to generalize from small amounts of data as well as a human brain, but nevertheless we are able to train it with lots of data.
So you could imagine a thing which is not as good at learning and generalizing as the human brain, but still can achieve superhuman capability given the amount of training data and computers we're allowed to use.
Jim: Yeah, so I'm, I'm talking more about the input, the training side of things. It seems like there's gotta be some way to, to feed a lot less data and get it to where it needs to go, but you keep the same, the same transformer architecture in there, just the way that the learning happens would improve.
Steve Hsu: I'm not sure I understand what you mean, because if you, if you keep the same transformer architecture and you, you're still using gradient descent, which everybody uses, then there is an additional room for speed up. I think you would have to go to some much better architecture than the current transformer to get to sort of super-fast learning and generalization.
Well, you, you have a structure that you've defined and you're training it using gradient descent. And so, if you keep training it using gradient descent, then you're not going to get a speed up there. And if you keep the same architecture, you're not going to get a speed up, you're not going to get a radical qualitative change, right?
So, I think what Jan is saying, what I believe, is that there will be, there are architectures that are much better than the transformer that we're currently using. but we don't know what they are.
Jim: Well, I guess what I'm saying is maybe there’s something better than gradient descent.
Steve Hsu: That's hard to say because theorems about gradient descent and how it's actually close to optimal. Like it's actually a convex optimization in, in a certain limit. So I doubt that. Well, what you're saying is possible, but it does, it sort of takes us off the track of, I think, what I wanted to discuss.
I'm just hoping that I'm clear with the audience that I'm sort of stating Jan LeCun's position at least, or a charitable interpretation of his position precisely or understandably, and it's sort of my position as well, that really the most interesting work is not trying to further scale these models, but to try to understand what different architectures might do a better job than these models.
I think that's his position, that's also my position. Now what's interesting about that is if you do believe that that is necessary for the next step forward, you then have to realize you're going to do a lot of experiments. So these experiments would be like someone comes up with a radically different architecture and then has to also do a huge amount of training, you know billion dollar training runs and stuff like this on it. And experiments as you know require many many many iterations before you hit on the right thing because everything in ai is empirical. And so If you ask like, oh, how could it take, given where we are now, how could it take another 20, 30 years to get to, you know, super AGI? I can, I can tell you stories where it takes another 20, 30 years, no problem.
I don't think that's the predominant view in Silicon Valley. I think, I mean, among the people who most enjoy writing and talking about AI, they like to, they like to think about the fast takeoff scenarios where, Without a change in this architecture, keeping it more or less a transformer and just moving along the scaling curve we, we get, you know, we get super intelligences out of it.
Jim: So are you putting it around 20 to 30 years at this point?
Steve Hsu: I guess, you know, I could continue to tell stories, but, but no, I think most of my probability distribution would be in 20 to 30 years, we will have super intelligences and even in the worst case scenario where we only have like, an LLM, which is, you know, one or two generations better than GPT 4, I'm pretty sure I can build something where these other functions that the existing transformer model isn't particular, particularly great at, you know, the external memory or some kind of long term planning or goal orientation.
I think I could build those in software using other stuff and then connect it to the LLM. That's the kind of thing that we do at Superfocus. And so one way or the other, for sure, I would say in 20, 30 years. we're probably going to have some kind of superintelligence-like thing. The exact nature of it, of course, is not defined, but clearly better than human.
So let me, maybe that's enough about the view from Silicon Valley. it's an all out race. I don't think there's that much thought, I mean, there are people doing research and thinking about safety and alignment, but I think the racing part of it is totally dominant over the safety and alignment part.
I think if you're an EAC, what I'm saying is probably going to make you smile. If you're a PAWS AI person, you're going to be gnashing your teeth at what I'm telling you, but I'm just telling you what I observe from talking to lots of insiders.
But now let me turn to the possibility that all of this stuff is a bubble and by bubble I mean in the sense of the bubble in the first internet, in the phase of the first internet revolution, late 1990s, early 2000s, which Jim and I actually lived through.
So in Silicon Valley, we often talk about a hype cycle. So you have a period where things are hyped to the moon. But then you know, and everybody gets super excited and they bid the price of NVIDIA to infinity. And then there's a kind of peak and then people start to realize that this new technology, whether it's the early internet or it's these first language models that are coming out, are not actually generating that much economic value.
And so then there's usually an overreaction where people kind of get disappointed. And then it takes some time where in the background the technology is being worked on seriously and eventually it does achieve all the impact that the early hype cycle people, hypesters claimed it would achieve, but it just took a lot longer than people thought.
And, and so if you look at, say, Amazon, you know, if you were a hypster, you were willing to pay a lot of money for, like, companies that were selling pet food online, but in reality it took a decade or two before Amazon really became a profitable company. And, and so if you look at, say, Amazon, you know, if you were a hypster, you were willing to pay a lot of money for, like, companies that were selling pet food online, but in reality it took a decade or two before Amazon really became a profitable company.
a monopoly which, you know, which is the world. So, we could go through something like that also with language models.
And so now I want to pick up my travel story in Singapore. I went to Singapore. I had meetings with top government officials including the chief AI officer, including people on their economic development team.
Teams that try to recruit AI startups to Singapore. They're very serious about trying to get AI startups to come to Singapore. but the main reason I went to Singapore was to meet with the CTO and CEO of a big public company, which is one of the largest. outsourcing companies for customer support in the world.
It's a public company. Their share price has not been doing well because once, once GPT four or once GPT, once chat GPT came out, people realized that their business is kind of a legacy business and maybe we would be replacing all the humans who work in call centers with AIs. And so the market has really crushed the price of all of the outsourced customer support companies in the world.
And this company that I was meeting with I guess I can say the name Teleperformance as well. And so I was meeting with them to show them our work at Superfocus to demo our AI, which can use voice. It can perform complicated customer service tasks. We can take a memory, the memory is built out of the actual training materials that a company would use to train a human to do this tech support sort of thing and we can attach that memory to the LLM and make the LLM restrict its conversation to topics which are covered in the memory and to not deviate from the information and the policies that are in the memory.
So we can basically create a customer service AI that does not hallucinate, and is more efficient. superhuman in the sense that it makes fewer errors than a human customer support agent. And so I was demonstrating this to these people on this team while I was in Singapore. And the demonstration was a success.
And subsequently with Teleperformance, Superfocus has done three openings of AI labs, one in Seoul, one in Tokyo and one in Shanghai. These are lab teleperformance labs. they're places where their customers come in. They have big customers, some of the biggest, biggest Korean and Japanese companies and Chinese companies, banks, consumer electronics, department stores, et cetera, come into their AI lab to see what the technology can do.
and our technology was used to demonstrate the power of AI. in these, in these lab openings, which just happened in the last month. So that was a great success for our startup to, to, to be able to, you know, partner with a big public company like that. But one of the things I learned from talking to them is that there are very few startups or any entities at all that have built a similar technology to what Superfocus has built.
So that was what was interesting for me to learn. And of course, in talking to Our counterparties like Teleperformance and also other call center companies that operate in Manila. I'll get to Manila in a second. I can tell you the amount of economic impact from these language models in the enterprise.
So, so actual applications by companies where it's not some hit or miss thing where some kid is trying to cheat on some essay. Or someone at home is trying to look something up and they just ask GPT what it is and the answer is probably right, but there might be some little problems with it. I'm not talking about applications like that.
I'm talking about applications where the AI really can be substituted for a human and its answers can be relied upon as well as the humans that it's replacing that person's answers could be relied upon. There are really very, very few real world deployments, almost none at scale at the moment. They will be coming this summer.
It will be much slower than people think, because it is difficult to get these models to behave the way you want, to limit hallucination et cetera, et cetera. And so that was the lesson of Singapore. And then I moved on to Manila, where we have existing customers and one of our investors is a private equity fund in Manila, who has introduced us to all of these Philippine companies.
The Philippines is one of the leaders in outsourcing this kind of call center work, or they call it BPO business process, outsourcing work, and customer support work. And it's because wages are relatively low in the Philippines. but generally you can find people who speak English. And so if it's a, it's a job that can be done remotely by someone speaking English, then the Philippines has become a global center for that kind of activity. India as well.
In the Philippines, about 10 percent of their GDP is generated by this kind of outs, remote work, outsourced work of the type I just described. Which is potentially replaceable. The humans doing it are replaceable by AIs of the type that we build. so it's 40 billion, 10 percent of the Philippine GDP. That's really up for grabs if you like AI, but we're just at the beginning of rolling out AIs that can do this kind of work. any questions about this, Jim?
Jim: Yeah, I'm just curious if you're seeing other companies as a result, so it's an [unclear] like you guys had.
Steve Hsu: So if you ask specifically for AIs that can do really complex support, one of our customers that we've done is probably the biggest. project which is a consumer electronics company that makes smart TVs.
And we've built an AI that can troubleshoot all 300 models of smart TVs they've sold over the last 20 plus years. I've not seen anybody, any other company that's built something that complex and is close to deploying it.
I should add that one of the technical challenges we had to overcome was that our AIs can use voice. And all of the engineering and technology that we had to do in which we have a model. The model is doing inference, but the model is consulting an attached memory while it's doing that inference. all of that has to happen in less than two seconds, including the speech to text and the text, text to speech part of it.
Because if you're talking to someone and you finish speaking and you have to wait more than two seconds for that other person to respond, it feels weird. Okay. And so we had to engineer it to the point where the latency is about two seconds or less. And so I've yet to encounter another company that's built all the stuff that we've built and is as close to deploying complex AIs of the type that we've built.
Jim: With hallucination in particular? Are there other companies? Or?
Steve Hsu: There are a lot of people working on it. So there's a company called Perplexity that's trying to do search. And so they are fighting, obviously fighting hallucination all the time. So is Bing. So is Google. Anybody that's doing a search. so you can have an AI first response to a search query. Everybody who's doing that is fighting hallucinations.
and it's quite hard. Now, the problem they're solving is quite a bit harder because The range of queries, the range of search results that come back is huge, right? It's the whole internet. Whereas we are always working in a very confined space. So we're working on, you know, this set of televisions, this set of problems that you can have with the television, or this set of products that you could order from an e-commerce vendor, this set of problems with delivery issues, you know, it's a very finite universe.
And we find when the universe is that finite, we can build more or less completely reliable AIs.
One, now one of the questions people ask is what's going to happen to all these people that have these jobs? Because in the Philippines to get one of these call center jobs is your ticket to the middle class. And there isn't a big manufacturing sector there. So really the path to the middle class goes through these kinds of jobs.
And if half of them are replaced over the next five years by AIs, that will be a huge shock to the economy of the Philippines. And they are acutely aware of this there, but we have an interesting dynamic where the people that own the BPO companies, I could call them capital, have different interests than, say, labor, which is the workers.
And capital realizes what's happening, and they're very interested in partnering with AI companies like Superfocus. Labor doesn't really know what's happening. I think they realize stuff is happening, but there's not much they can do about it. And so I think that Our experience in the Philippines will be one of the most interesting, and maybe for future historians, iconic you know, instances where AI labor, AI capability met human labor, and you can look at what happens.
When people ask about the social impact of AI you know, there are different analogies you can use. You can say, well, when the automobile came along, a lot of blacksmiths and buggy whip makers lost their jobs, but they were able to retool and do other things. And ultimately, society ended up much better off.
But the horses were never able to recover, right? There are far fewer horses in our cities than there were before the automobile came along. So is the modal human worker more like the horse in the face of AI or more like the blacksmith or buggy's whip maker who can retool and do something else useful in the post automobile economy.
And I'm afraid it's a little closer to the horse situation for a lot of people, but of course I could be wrong.
So another thing I wanted to get into is a festival.
So at the end of all of these travels, I I stopped in Berkeley for a festival called Manifest, which is, was, was held at Lighthaven, which is a beautiful location in Berkeley, about a mile South of Berkeley.
Of the UC Berkeley campus. And the property was purchased with the help of a guy called Jaan Tallinn, who is a billionaire AI investor and investor in some of my companies. an old friend of mine, I interviewed him. If you go back several episodes of Manifold, you'll see an interview that I did with him.
He's very concerned about AI safety, existential risk from AI, AGI. And he funded the creation of this Lighthaven campus in part to foster conversations among rationalists, effective altruists, AI scientists, all of these different communities to think about the future that's coming. And so there was a conference festival held there called Manifest.
It was a ball. Jim was there. lots of other people were there. Dworkesh, Razib the Collinses, who are the pronatalists. It was an amazing weekend for me. I, I, I can't remember the last time I had that much fun at a meeting and got to see so many old friends. and interestingly, although probably AI was the number one topic that people wanted to discuss at this meeting, probably number two was genetic engineering polygenic screening of embryos, stuff that I have worked on in the past and my company, Genomic Prediction was a pioneer in.
And so it was very interesting that okay, we had a lot of AI to talk about, and then we also had these other things to talk about. Now these other things are a little bit controversial. You know, some people refer to this kind of thing, genetic engineering, embryo selection, as eugenics. And so I think people outside the meeting might regard some of the things that were discussed as controversial.
I think within the meeting they weren't really considered controversial. Jim, do you have any comments about Manifest? Did you have a good time there?
Jim: I had a great time there. Listened to a lot of different topics. I tried to go to as many of the AI things as possible. I guess you might touch upon that in a second. What really kind of stuck out to me, there were a lot of really smart people there. Opinions were all over the spectrum with what’s happening with AI right now. Everybody’s just sort of guessing. It seems like the algorithm that people have to figure out what's happening with AI is to build a story in their head of how it’s going to go down and then they get sorta married to that idea and then repeat it over and over again.
But you get people who think that this is nothing and it’s going to take forever to go anywhere and people who think that the singularity is coming in the next couple of years. And it's going to destroy humanity and every opinion in between.
Steve Hsu: Yeah, so on that point, you may remember you and I were sitting near the front at one of the big outdoor sessions, and Dorkesh was the, the, the speaker at that session. And he was asking the question, is AGI at a point of imminent takeoff.
And the analogy he used was COVID in March of 2020. So at a point where smart people knew we were, you know, we were in for it. And still, maybe a lot of people hadn't figured it out. And so his question was, is it March 2020 for AGI? And a funny thing happened there. So one of the top, one of the points that came up is whether governments are going to get involved with this thing to the point where they're committing, you know, significant chunks of GDP.
You know, like 1 percent of GDP, 2%, 2 percent of GDP. Something equivalent to what was spent on the Manhattan Project, for example, during World War II. When will that happen? Will that happen? And I remember making a comment about this saying that, well, at the moment the amount that the government has to spend on this is quite limited compared to what private companies, big tech monopolies can spend on it.
And, and consequently, everything right now is dominated by the work of these big tech monopolies. and from the back, I don't know if you remember this, but from the back, someone commented on my comment, said something very interesting. And as soon as the session was over, I went up to talk to him because he made an interesting comment.
And it turned out he and the guy sitting next to him were both founders, anthropic founders. So, they definitely know what they're talking about, right? but I just thought that was a very good anecdote for how interesting this particular meeting was, that, you know, that you had really quite a range of people there.
Jim: Yeah, it was pretty hard to teeth out a signal on where everybody was leaning, though.
Steve Hsu: Well, I think everybody is all over the place. And so, you know, only time will tell. I guess one of the things I say to people sometimes, like I, I attend during all these travels, I attended a meeting of CEOs, including CEOs of, you know, big public companies and, and I was supposed to give a talk on a AI and one of the things I said is, there's really a lot of uncertainty right now at this moment.
You can find, if you take the set of people who are directly involved in. Building the next generation of models and you ask them what's going to happen. There is really quite a range of what their expectations are over the next few years. And so nobody really knows. I think just time will tell. One of the things that people were talking about was an interview, a four hour long interview that Dorkesh did with a guy called Leopold Aschenbrenner.
I think I'm saying his name correctly. And he has written a hundred page manifesto. I think the title is something like situational awareness and then something, something AI, AGI. and his thesis is we will, well, I think, I think later he qualifies as like 20 percent chance this will happen. So, but, but when he talks about it, he talks about it with extreme confidence.
Which I find a little strange because then he later when, when pushed, he says, Oh, this is maybe 20 percent chance. This is going to, it's going to come out this way. But in his scenario. We do proceed along these scaling curves. Subsequent runs training runs cost, you know, 10x more each time. So you get to the point where the money and the, literally the energy necessary to power these training runs becomes a limiting factor.
And eventually it's driven completely by governments and it turns into a race between the United States and China to see who will get to super intelligence first. And if that sounds like the plot of a science fiction story, it definitely is. but who knows? It might play out that way. And you know, I've, a lot of serious people looked at his stuff and are thinking about it pretty seriously.
But I don't currently see any indication that either the government of China or the U. The S. government is willing to put anything like, you know, a percent of GDP towards the race, towards superintelligence. I think it'll be interesting to watch whether that kind of transformation occurs. Like, for example, if the Pentagon at some point develops conviction that they really have to fund this at the level that they're funding, you know, strategic nuclear weapons and things like this.
but so far I think we're pretty far from that. Any thoughts on that?
Jim: Well, it's like, all of these, like I was saying before. You have, sort of people who have a scenario in mind, and it’s often plausible sounding. But it’s one of many. And then they become married to that one and then sort of see it like laws of nature or economics are gonna keep on that trajectory and bring it to that world. But everybody has a slightly different story in mind. It’s tough to figure out which one is the right one.
Steve Hsu: I don't think we know. I don't think, I don't think anybody can. I think anybody who says I'm 95 percent confident this is how it's going to play out and they specify a very narrow scenario, I think it's, it's just not justified.
Jim: Ninety-five percent of people have a story. Again, the stories can be plausible. But they’re one of many stories and they see only their story.
Steve Hsu: But I think serious people realize that, and this guy Leopold says it himself, I think when pressed, maybe by Dorkesh, he says, oh, maybe 20 percent chance it's going to play out this way. so,
Jim: Well, people act like they’re very certain, even when they say they’re not.
Steve Hsu: This is what I find off putting because, you know, you and I are trained in theoretical physics. Normally when you are giving a physics talk and describing some work that you're doing, you don't display that level of aggressive confidence unless you really are, you know, 95 percent or more confident that what you're talking about is going to be right.
Whereas this guy apparently thinks it's only 20 percent likely, but he talks as if he really believes it. And I find that kind of off putting. Just, just in terms of the feels.
Jim: One of the things that was coming for me, as I was watching this play out in front of me multiple times over the weekend, I really thought. I have a story in mind. I won’t even tell you what it is because I’m probably wrong too. But I’m sort of aware that it’s probably not how things are going to go through, but in my mind I feel very certain about it. And it probably won’t be.
Steve Hsu: Yeah.
Jim: I saw everybody else doing it, so I recognized that I was doing it too.
Steve Hsu: I think one point that a lot of people raised. Maybe you raised it. Robert Hanson raised it. I raised it. This could be one of the most interesting next, you know, even if it’s a slow scenario and it takes 20-30 years, it’s still one of the most interesting 20-30 year periods in the history of humanity. And so it sort of increases the odds that, you know, if someone were to make a simulation sort of historically inspired simulation about humans they might set it at this inflection point. And in any case, we're going to live through very interesting times.
Jim: Yeah. Could very well be. That aligns with my own internal story that this is going to be an interesting couple of decades. But it could be that the problems that we’re facing right now are going to turn out to be harder to solve than we thought and we’ll just sort of be stuck in a holding pattern for decades, too.
Steve Hsu: Now, another thing I want to say is that, like, here's something I have very, very high conviction in. Definitely over the next 10, 20 years, we will for sure build AI systems that are vastly superior to humans in all sorts of non trivial ways, whether the thing really fully emulates a human, you know, humans had to evolve first of all these, you know, spirit, you know, you know, epics where we were animals or, you know, multicellular beings, single cellular beings.
And, All kinds of capabilities for autonomous action and caring about self, maybe having the illusion of self. All of those things were important to our Darwinian survival and it may not be that easy to instantiate those specific qualities. into these synthetic intelligences. It may be easier just to make something which doesn't have a quote itself, but it's just extremely good at helping you solve problems in theoretical physics or, or you know, algorithmic math or something, right?
And so even if we don't get things that are fully human and superhuman. we may get, I'm sure we will get very non-trivial forms of intelligence that we just never imagined we could interact with. And I'm sure that will happen in the next ten years.
Jim: Well, I think that's been happening all along. The first moment that you had machines that could multiply, you know, millions of milliseconds. Yeah, something that was already, like, doing something that humans took a lot of effort to do beforehand, and it changed the way the world worked.
Steve Hsu: Yep, I totally agree, but it's becoming psychologically more pressing as, you know, these things now can talk to us, right?
So you know, one thing I will say is that, you know, because Superfocus solved all these voice problems, speech to text, text to speech engineering the latency down, we We knew A, there were economic reasons why you want that because obviously that helps you replace like a call center worker who's talking on the phone.
But we also knew that when we took these things into meetings, people would freak out. That, that, that, if, if, if I could just open up my laptop and start talking to it and have a real meaningful conversation where the, the, I could make the conversation meander around a little bit, but theN I would give reasonable responses and things like this.
that it would have a strong psychological effect on the people who saw that happen. And it absolutely did. I did demonstrations like this for the teleperformance people, for many people who run customer support call centers, and for these, this group of assembled CEOs. And all of them were, I think, very, very impressed that the AI could do things like this.
Jim: Yeah, I believe it. Although probably for the average person on the street even just what Alexa was doing could amaze them.
Steve Hsu: Yeah, I guess that's true. I, yeah, I guess so. I mean, I guess among the tech people or the scientists, the narrative was always how crappy Alexa and Siri were. But maybe for average people, they found it pretty magical.
Jim: I mean, I've sort of said this to you before. There are definitely jobs that are going to be replaced with computers going forward. But the one thing that seems like the most important one to me is programming. Once you get to a point that you can take a large code base, hand it to a robot, and say improve it.
It might be over at that point. You know? Yep. Cause then you can just ask it t o improve itself. Yeah. So, so, all these other things we can get used to. But that one is very likely to change, you know, earth.
Steve Hsu: Yes. If it can modify itself and improve itself, then it's a whole new ballgame. And humans are just getting to the point where we can do that.
So we're just getting to the point where we can change our genetic code and modify ourselves. Although, you know, it takes generations to do it.
Jim: Yeah. Cycle times are decades for that one.
Steve Hsu: Yeah, exactly. Alright, so I said we would do an hour of this, and we're getting close to that. Any things that I didn't discuss that you want to discuss?
Jim: I think I’m good at the moment.
Steve Hsu: Alright, well, thanks for your time.
Jim: Thank you.
Steve Hsu: I thought I'd shoot a little bonus footage here. I'm experimenting with a new camera. It's a DJI Osmo pocket three and I just got it and I'm trying to figure out how to use it. I used it for the footage that you saw where I was talking to my friend Jim about AI and I'm going to record a little bonus track here for the same episode of the podcast.
And I'm here in my backyard in Michigan. I wanted to say a little bit more about the competitive situation between China and the U S in terms of AI, I think this is for some reason, not really very understood by even people who are working on AI in Silicon Valley, they tend not to really track.
That is closely what's happening in China with the Chinese models and the GPU capabilities that are available in China. I guess let me make two points. One is that Huawei has a chip set. It's the Ascend 910B and this chip set is a little bit inferior to the A100, NVIDIA A100, but not that much worse.
And so with the availability of that chip and also NVIDIA hardware that's been purchased either directly before the band started or you know, through intermediaries or maybe is available to Chinese companies through. third party data centers, for example, in Singapore or the Middle East.
I don't think there's a huge hardware gap between what the Chinese companies can do and what the U. S. companies can do. it might get worse in the future, but I think it'll play out probably over several years. One of the main issues is whether SMIC, which is the leading fab in China, is able to get below seven nanometers or five nanometer scale fabrication.
The Ascend 910B is made on a seven nanometer process. And as I said, it's pretty competitive with the NVIDIA A100. I believe NVIDIA also still uses a seven nanometer process at TSMC although they're going lower. so right now we're still kind of at parity. in terms of hardware, in terms of the model innovations, innovations in the architecture of the neural nets that are used to actually build these language models.
I would point people to QN, Q1 has some interesting innovations in its architecture in terms of the way that it looks at the context window the way it deals with attention heads, and it's claimed that for the amount of, for the number of connections or size of the model it is actually one of the best models according to the benchmarks.
Thanks. Now you can never be sure to what extent companies are either deliberately training to be good at the benchmarks, so then the actual real world performance of the model would be not be well predicted by the benchmark performance or it could just be that questions that appear on the benchmark or similar questions Sort of leaked into the training data.
And so again The benchmark then becomes a less accurate prediction of what you're really going to get when you use the model so we don't really know but It'll be interesting to see for people who are in real world situations Using QN. QN is supposed to be, for example, superior to LLAMA 370B, which is the best open source model that's generally available right now and just a little bit short of GPT 4.
It'll be interesting to see what people say about the quality of QN2 in real world situations. I think our company, Superfocus, will be doing some testing with it because it has some language capabilities that are useful to us. in particular, you know, the wide set of languages that it can operate in.
But anyway, in, in, in the longer term, I think there's no obvious argument that says that the Chinese are going to lag behind in terms of the quality of the technical innovations on the model architecture side, or the ability to do large scale training, or the ability to scale their models. I would say they're probably a little bit behind the US, but it's not a huge gap.
And I don't see any reason to think that the gap could become huge in the next few years, although you know, time will tell. I think in the Leopold Aschenbrenner scenario, which is in his document situational awareness, he more or less assumes that the Chinese are going to be way behind and not really able to keep pace. it might play out this way, but again, like, I don't, I don't think, he's made any kind of really systematic study of what's possible over there compared to over here. I hope that, you know, there are people on this side of the Pacific that are actually tracking the quality of the models and the capabilities of the companies over there.
I'm sure they're tracking what's going on here. The general situation always seems to be that the Chinese side knows better what's happening in the U. S. and the U. S. side is a little bit complacent and just as, you know, is willing to make kind of cartoonish assumptions about what's happening on the other side.
And hopefully that's not gonna get us into trouble.
The other part of this scenario, the Aschenbrenner scenario, which I think is important to think about, is whether, at what point, it's plausible that the government would start thinking about AI as a truly a technology that has the kind of geopolitical or military applications, you know, comparable to say, to strategic nuclear weapons. And so at what point would you get a kind of Manhattan project to improve these models? And you know, I, I don't detect any such movement at the moment in government. And so at the moment we're really in the situation where everything is dominated by the private sector and government scientists are really kind of locked out.
It's actually a shame because like if you think of the U S national labs, there are a lot of talented researchers there, but I think by and large, they're not involved in state of the art model training. When will that situation reverse? It's hard to say. I, I could imagine something like GPT five, six, you know, the next generation is showing such amazing capabilities that eventually becomes possible to convince the government that they should literally nationalize all the AI companies and maybe the GPU companies like NVIDIA and just push an effort that you know, requires a meaningful investment of a fraction of GDP, like 1 percent of GDP or half a percent of GDP purely focused on AI superintelligence.
I, it could happen that way, but it just doesn't seem like it's imminent. It doesn't seem like within the next year or two that's going to happen. And so we'll just have to wait and see. Thanks for listening.