The Kerman Kohli Podcast

In this episode, I'm chatting with Arjun Bhuptani of Connext about cross-chain bridge security.

What is The Kerman Kohli Podcast?

Kerman Kohli chats with guests about complicated topics in a way that is easy to understand.

Kerman:
Hey everyone. Today, we have Arjun Bhuptani from Connext. He's the founder with us discussing everything bridges. So thanks Arjun for coming on.
Arjun:
Thanks for having me, Ryan.
Kerman:
Sweet. Well, let's get this kicked off. So let's just start really simple, what are bridges? And I think we can build from there.
Arjun:
Yeah. A bridge is a way to communicate between blockchains. So using any of a different set of mechanisms, relaying funds and/or data between two chains or chain and a rollup or two rollups. And then the piece that's important is the different ways in which you can do that. So at the very simplest level, you could have some sort of fully custodial system, like a centralized exchange that is just taking funds off one chain and giving funds on another, and those have been around forever. And then you have more complex systems that kind of build up from there. So you can do things like atomic swaps, which have also been around forever, where you're swapping one asset for another using HTLC contracts on chain, up to multisig bridges, which is the way that a lot of the bridges that are out there in the world right now work today. And then to more complex constructions, like optimistic bridges or client bridges, which are trustless mechanisms to have fully expressive communication between [inaudible].
Kerman:
Sure, so it feels there's two vectors. Well, first question is, is there a difference between bridges that communicate data versus that hold money? Or are they more or less the same thing?
Arjun:
Yeah, that's a good question. So I think what we are seeing right now is the bifurcation of different bridges into messaging layers and liquidity layers, because people are realizing that it makes sense to separate those things out. In theory, there isn't actually any difference. The primary difference is really just, in one case, you're having to bootstrap a bunch of liquidity, which is a secondary step. You'd basically need liquidity on a given chain to be able to send assets on that chain. Versus in the other case, you're just sending a message, and that message could be mint an asset. But when you mint your own asset it's basically creating a wrapped or representative version of the asset that you had on a second chain.
So I think, it's like, these are two different things only in the sense that they have different requirements. And assets are such an important category of things for blockchains that it's worth special casing them, but they're not so different in the sense that they're still underpinned by the same core mechanism. And most of the time, the liquidity and messaging layers are intertwined anyway.
Kerman:
Sure. So the way that I basically interpret that is a messaging bridge is a poor man's bridge, because it has no defense ability because it holds no assets and it's easy to replicate effectively. You'll probably have a lot of messaging bridges giving the defensibilities quite low versus bridges that hold assets are, of course, a lot more defensible given the state that they contain that's backed by money. Is that a quick representation of how this industry's going to play out for those two dimensions?
Arjun:
Yeah. I would say there is no sense in building... Because of exactly what you said, there's no defensibility in just the messaging layer by itself, my expectation is every single messaging layer is going to have its own liquidity layer or it's going to have a partner. So in our case, we are a liquidity layer and Nomad is the messaging layer and we work on a one-to-one basis closely together. For other systems out there, like LayerZero has Stargate, which is its liquidity layer, Axelar has Satellite, Wormhole has Portal. Every kind of system is now bifurcating into that, but everyone is realizing liquidity is the way that you build that mode.
Kerman:
Completely. Cool. Sweet. So then now when we talk about bridge designs, we've got like, [inaudible] Binance, some sort of centralized setup, then you've got the next kind of one in the spectrum that sounds like it would be a multisig most likely, or what's that... Or is there anything in between?
Arjun:
Yeah, I mean, I guess there's something that's fully custodial, then there's multisigs of various sizes. So you could have a two out of two multisig, up to a however N out of M, and I think in that spectrum, you basically have small multisigs, which are MPC systems and stuff like that, or like the Ronin Bridge or Harmony or whatever, all of the other ones that had similar situations with a small number of signers that got hacked. And then you could have larger, unbounded, effective multisigs. So basically a large unbounded set of people, which usually takes the form of something a POS chain, so Axelar falls into this category where it's a tenement POS chain with an unbounded validator set, and everybody's sticking in the system. But of course, there's still M of N people that are deciding that something needs to happen. And that size, that validator set size is still probably smaller than the chains that it's working between.
Kerman:
Sure. So a lot of the hacks that we've seen have been basically some sort of MPC type scheme, where the signers got mixed up somewhere in the mix because they're all programmatic and they're not humans, because they can't be humans by definition if they're a program. So it seems that design is probably the worst so far because it's only a matter of... If some hot wallet somewhere in the world gets hacked, all the funds get lost. And it's almost becoming predictable now. It's a really scary thing, the size of the hack. So we know that design space is kind of crap. Centralized exchanges aren't really that great either. So I haven't heard a lot about unbounded multisigs, where you have say a proof of state network that is effectively multisig. What's your take on that design space?
Arjun:
So basically, there's a range of economic security. All the way on one end, you have multisigs on one end, and then that also sort of includes LayerZero because what LayerZero does, it's like a relay system that is a multisig and then an Oracle system that is a multisig, and then you just have these two overlapping multisigs. And in theory, you're assuming that there's no overlap between these two sets, but in practice, there's no way to guarantee that. So in practice, what you end up with is basically the union of those two things, which is a multisig of uncertain size, which is a little bit scary.
Yeah. And then further along that spectrum, as you increase the number of participants, basically what happens is it gets harder and harder and harder to attack the system, right? Because a multisig that's three out of five, you only need to compromise three keys, et cetera, et cetera. Now, the difficulty with having any sort of multisig with a fixed set size is you can always find the people that are the signers on that and basically just pay them money, right? You could bribe them to be like, "Hey, if I pay you X amount of dollars, will you help me steal the money in this system?" And in fact you don't even really need to pay them to do that, because if there's $10 million in the system, there's 20 people controlling it, there is an incentive for those 20 people to steal funds from it no matter what.
It's almost like at that point, the static state of the system is that you are hoping for the benevolence of the signers, which is not great. Now, the way to kind of make this a bit better is to make it so that there isn't a bounded set of multisig signers. When you have an unbounded set, a permissionless set, now it's harder to actually track down where the people that you would have to bribe, and the economic security now comes from the sticking power, the economic power of the chain itself, right? So can you force this chain to produce an invalid update through controlling a majority stake? That's strictly better because you can also slash the stake if people try to cheat or try to fraud and things like that. But once again, there is a fixed value for the amount of economic security that that produces, right? There's a fixed amount of money that you could pay to bribe people in that system and to basically bribe the system to...
Kerman:
Cool. So then we zoom out in abstraction, we go to the far end of the spectrum, what is that final design space? It offers maximum security, but is of course a lot more harder to pull off.
Arjun:
Yeah. So the other end of the spectrum is optimistic bridges and light clients. And I guess actually, all the way at the end of the spectrum, there's rollups and rollup bridges and that's a special case I can talk about in a second. And one way to think about this spectrum is sort of like a difference between side chains and rollups, where you can have a POA side chain, which is obviously extremely centralized and can be extremely easily corrupted. You can have a large scale POS side chain that is harder to corrupt. But in both of these cases, these have a different set of economic security trade offs than the main chain itself, right? It's hard to argue that a side chain, even if it's a very, very big and very successful side chain, it's hard to argue that side chain has the same economic security as Ethereum, because it just doesn't. Their invalidator is not validated such.
Now, that's kind of the big benefits that rollups confer, is that with a rollup, you have this system that acts kind of a side chain, but it's entirety of its economic security is derived from the economic security of Ethereum. Yeah, it's like within with reasonable assumptions, and those assumptions are basically you can't attack the chain for more than a certain amount of time and there's liveness of nodes and stuff like that, with reasonable assumptions, you could basically say that this has the same economic security as Ethereum.
Now, the reason that rollups are able to offer that is the chain itself, the chain of the rollup doesn't actually really exist, right? It's just this emulated environment that can be halted at any time. And so if something happens that goes wrong, the rollup can be rolled back or you could basically take action, and the Ethereum is the source of truth. When you have two discrete chains that are not connected to each other, there's no absolute source of truth between the two of them. You now have a relative source of truth and you have to figure out which one of those two things is correct. And so below the kind of absolute best case of here is a rollup bridge, you have light client bridges and optimistic bridges. And those are kind of the rollups of the bridging space. They're not as secure as rollup bridges themselves, but they're kind of the rollups of the bridging-
Kerman:
Sorry, just want to quickly confirm something. So we've got side chains is like... So we're on kind of far end of the spectrum. And then one layer below, we've got two abstractions, one being side chains and the second being rollups. And then in rollups we have light clients, light bridges, and the second one being...
Arjun:
Optimistic bridge. So what I'm saying is that in the trust spectrum across the board, all the way on one end, you have custodial exchanges and all the way on the other hand, you have rollups and then everything in between is kind of what we consider to be bridges right now. So you have multisig bridges, POS bridges, optimistic bridges, lifeline bridges basically as you move across that spectrum again. Now, those two are actually really important. The reason is that basically similar to optimistic, rollups and ZK rollups, they give you ways to do bridging where you're relying on the security of the underlying chain to actually bridge.
In the case of light client bridges, what you're doing is you are verifying the consensus of the chain. One chain is basically natively verifying the consensus of the other chain. So you take a block hash, and then you take all of the stuff that went into the block, and then you run the consensus algorithm and figure out if you actually get the block hash. And if you do, and if you can actually validate that, then the end result of that is that that any update that was passed through the bridge is valid because it comes from a valid block.
Now, there's challenges to this. It's not really possible to validate all forms of consensus on a blockchain. And in some cases, it's economically prohibitive. So for instance, Ethereum L1 is super memory intensive, so it's actually really expensive to try to validate that form of consensus on any other chain. And then you have other cases too. So you have things like Avalanche, which uses a finality gadget, like a Snowball, which uses a finality gadget.
And so in order to have a finality gadget be emulated off chain, you would need a multisig or something like that anyway. So that would introduce trust assumptions. So it's not like a silver bullet solution and that's kind of the trade off of the light client bridges is that they're really hard to build similar to ZK rollups, they're hard to build, they're really custom. And in the case of light client bridges, it's not necessarily the case that they're always going to be possible either. They might just be unfeasible.
Kerman:
Yeah, because one question I have is, if you have a proof of work system, you still got kind of nothing's ever deterministic, it's all probabilistic based on how many confirmations you get. So how do bridges even work in that scenario? Is it kind of just the best guess of like, "Hey, this block adds up, so we're just going to assume that it's good," or do you take multiple block confirmations? How do you actually navigate that?
Arjun:
Yeah. You have to take multiple block confirmations. So in all of these cases, you would need to make sure you wait enough block confirmations to be able to guarantee that finality has occurred, because that is a fundamental assumption in any kind of bridge, is you have to assume that finality has occurred, at least enough economic finality, because otherwise the bridge operators will get rocked for sure.
Kerman:
Sure. So there is basically a delay from a UX perspective there, where it's like the... Yeah, okay, cool. And this is where it's like, "Hey, we have a fast bridge. It's instant between Ethereum and Bitcoin." It's like, yeah, that's probably a red flag because you can't do that unless you're probably a custodian somewhere. So by definition, bridges will never be instantly fast, unless you have an abstraction that lives outside of the system where there's a double accounting system somewhere. But like-
Arjun:
Yeah, you can do it in the case where the chain itself has instant finality. Eventually when with Eth2 constructions and things that, Ethereum will have instant... That's actually not true. There was discussion about that, but that actually is not going to happen anymore. But I think there are other chains that have fast finality, and in those cases you can circumvent this window a little bit, but it is a bit of a challenge.
Now, the last kind of bridge construction actually also has another latency delay, which is interesting, because sometimes people will get confused between let me wait for finality on the origin chain and then let me wait for the bridge itself. So the last construction is optimistic bridges. And this is the one that we and Nomad use, and it's becoming very, very popular now. Synapse today announced that they're pivoting to being an optimistic bridge. I know there's a couple of other projects that have expressed interest in building their own optimistic bridges now. And I think the reason is that it seems to offer the best set of trade offs.
So optimistic bridges, the way that they work is, instead of just relaying data, having some subset of people relay data and then just saying it's valid immediately, an optimistic bridge relays data and then there's a certain period of time within which anyone can dispute. So it's sort of like an optimistic rollup, where anybody can make data updates on this chain, they get posted to Ethereum, and then if you want to exit the system, it takes seven days to exit. Anybody who wants to prove fraud has up to seven days to show that you committed fraud prior to exiting.
In the case of an optimistic bridge, the fortunate thing is that time latency can be reduced to about 30 minutes, and that can be done. And so what that means is in an optimistic bridge, you can do this generalized communication between chains and it can be deployed easily to any chain, you're not verifying consensus or anything like that. So it just lives very, very simple set of contracts living in the VM.
But you have this trade off, which is it takes 30 minutes to send. Now, I guess one interesting and final point about this on the security spectrum is that there's a really [inaudible] thing about bridges that actually makes this a bit more complicated. So in the rollup space, you could pretty much statically say, "Okay, well optimistic rollups are amazing. They're easy to build and they're really fantastic. They give you good trust trade offs."
But everybody's like the ZK rollups are the holy grail. And ZK rollups are the holy grail because they let you have immediate confirmation as soon as proof data is posted on chain. The validity [inaudible] give you the best possible set of trade offs and they give you the best possible security.
The thing that makes this really complicated on the bridge side is that a attacking a chain, it is impossible to steal funds by attacking a chain. So if I 51% attack your chain, the worst case scenario is a chain forks. But in a bridge, that's different because a bridge is basically the connector between these two chains. You could create an invalid state transition on the bridge and then push spoof it into either of the two chains to basically force one of the chains to believe that this happened.
From the chain's perspective, the bridge is basically an oracle, right? It's just a magical source of data where you just trust whatever's coming from it. There's no way to directly validate that the bridge itself is succeeding or failing.
Kerman:
Correct. Yeah.
Arjun:
So that's the thing thing that makes validity proofs, including client or even ZK based mechanisms for bridging kind of dangerous for bridges specifically instead of chains, is that on their own, having validity proofs that relay data across chains isn't enough. If a chain gets 51% attacked, if someone manages to hack the contracts or somehow manages to spoof data into the bridge on one chain, a chain that is using validity proof is just going to relay that data without looking at it, and that's going to mean that that fake data basically resulting in potentially funds lost and things like that, that hack gets inserted into Ethereum or another higher ...
Kerman:
Yeah, so basically you're saying validity proofs are bad because you basically have something that's mirroring two chains. And if something is right on one chain, it's like, yep, that's right on the other. But that's actually wrong, so rather than using validity proofs, you want to actually use fraud proofs because you want to see what's wrong, not what's right.
Arjun:
Kind of, yeah. And this was the concern that Metallic posted about when he made his multi chain, not cross chain post. Because he was basically saying even in a world where you have perfect bridges, if you 51% attack a chain, a interconnected bridge connected world means you can 51% attack one chain to attack other chains, and that's bad. Yeah.
Now optimistic bridges get around this. Like you said, you can check for what is wrong, right? You can check for a 51% attack, you can check to see if there was a contract vulnerability or something like that prior to making... Basically anyone can dispute if something has happened, and so it protects against these kinds of cases. And we generally feel that's actually safer overall, even if with the optimistic bridge, you're not using a validity proof. So there's like that timely.
Kerman:
Sure, cool. That makes perfect sense. So how do fraud proofs work and how do you... Because I don't know if you could mathematically guarantee this, is this a social based consensus layer, where you've got a proof of state network that's like, "Hey guys, something's wrong? Let's raise the flag and kind of discuss around the table," or is there something more sophisticated than that?
Arjun:
It's actually, you can prove fraud easily and deterministically on the sending blockchain. This is kind of interesting because this is one of the ways in which optimistic rollups and optimistic bridges differ. So for optimistic rollups, there is one source of truth chain, but the fraud is occurring on the chain that is not the source of truth, it's occurring on the rollup. And so when fraud occurs, doing a fraud proof is hard. It's very interactive, it requires multiple rounds of proving and things like that. That's part of the reason why it's a one week delay.
However, restarting from fraud, so once fraud has occurred, being able to restart the rollup is actually quite easy because you have a source of truth, which is the Ethereum chain. And so once you've proven fraud, you basically just go with whatever Ethereum says and you restart from there.
In an optimistic bridge, it's actually the opposite. So it's really easy to prove fraud. That's why we can do it in 30 minutes. It's like, it's just a single transaction. Anybody can prove fraud on the home chain, which is the chain where a transaction is sent from. And the reason you can do this is because in an optimistic bridge, you have the person is similar to an optimistic role of sequencer. In a optimistic bridge, you have an updater. And the updater is responsible for taking data from here and putting it over here. And when the updater takes data from here, they sign that data and they post it on chain.
So if there's anything wrong with that data, the updater is on the hook for it, and it's pretty easy to show that something is wrong with that data on the origin chain. Because if there is an invalid update, it's basically just a Merkle root. So you can see the data that went into the Merkle root and you can see the previous root, and so if there was an invalid state transition, you would end up with a different root.
It's very easy to just compare roots and figure out did that invalid state transition happen. Now, what's more difficult in the optimistic bridges is how you restart. So you can prove fraud occurred, you can slash the updater, and you can basically stop the system. But restarting it is more difficult because restarting it requires social consensus. That can be a little bit more complex because you need to know when it is safe to restart and from what update you should [crosstalk]-
Kerman:
So when you have a contract hack that's technically valid state transition on the arch and chain, how do you basically solve that problem? Because it is a valid state transition, but it's not the state transition we'd like to happen.
Arjun:
Yeah. So that's a part of [inaudible]. When you hack bridges, one of the things, the danger is that you can take a state transition on one chain and push it back to the chain where all the funds are locked to unlock the funds. And then all of the assets that are been minted through this thing that come unpacked, right? And this is the big risk with Wormhole that everybody was scared about, because it was $300 million worth of assets are now on Solana, are now unpacked.
It's unclear whether the entirety of Solana is down. With an optimistic bridge, even if you can't necessarily prove that fraud occurred, you can still disconnect the system. So it is still possible to set things up in a way where the watchers that are watching the system can just see, okay, the amount of capital that exists on this side, on this chain, or basically the total amount of capital in the system isn't balancing out equally. So you need to have a certain amount of capital locked on the origin chain and then capital that has been minted on every other chain put together. And if those two things are not [inaudible] you can just disconnect. Yeah, exactly.
Kerman:
Right, okay. So it's [inaudible]-
Arjun:
So it's easy to check for-
Kerman:
Rough social consensus, but you're using still basic math to really see these numbers should add up, and if not...
Arjun:
Yeah. And this actually happened by the way. So yeah, there's been a lot of multisig bridge hacks. And one of the things that we've always talked about is you can take any of those multisig bridges and turn them into optimistic bridges. It's not very difficult. Really, you just have to add a set of watchers, and those watchers just need to see all the transactions that happen and add a delay, see all the transactions that happen and then stop any transactions that just look funny.
And this actually happened. So there was a slew of multisig bridges with Ronin and Horizon and everything, but there was one that not a lot of people realized or talked about, which was the Near Rainbow Bridge hack. So Near Rainbow Bridge actually almost got hacked. It would've been a huge, huge hack, because I think they had billions of dollars in there. And it was interesting because there was actually vulnerability in the contract, so it was not even a security model vulnerability, it was a vulnerability in the contract, which is really unavoidable.
But the Near Rainbow Bridge is an optimistic bridge and the bridge watchers actually disconnected the bridge and stopped the hack from happening before there was any downside. Yeah. So all of the funds were saved.
Kerman:
How cool.
Arjun:
And this was the only instance where a bridge hack was actually just completely stopped as a result of...
Kerman:
Completely, that's so cool. So one thought's really running through my brain as I kind abstract all of this out. Essentially feels a design pattern. Because blockchains are so unique in their security models and design trade offs, and whatnot, trying to have something that models the exact specifics of the source and the destination, and whatever abstract system we're dealing with in a distributed network is going to be too hard to model out.
But rather, if we have a model where we just assume, hey, let's just trust anyone to kind of push something and almost get shit done. But if we can prove that the thing that they did was wrong, then let's kind of adopt that. And it reminds me a lot of the Polkadot powertrain watchers model or Fishermen. I feel like that's where that paradigm first came in, but it feels we can kind of extend that paradigm to more things.
But I don't know, I just find it philosophically a very interesting way of thinking of like, Hey, let's just like anyone push things through, but then we prove things are wrong. But in this model, the only downside is you have to wait, everything isn't instant. And I kind of extrapolate, like right now we have a culture in society where people want things instantly and quickly and now, but it almost feels we're going to have to learn how to not want things instantly because quite literally good things take time.
Arjun:
Yeah. Yeah. Well, I think the good thing is there are actually ways to get around that latency, and that's really what we do. Connext and Nomad are basically two bridge projects. And I think sometimes people get a little bit confused as to the delineation between us two. And one way to think about it is that we're sort of the liquidity layer and Nomad is the messaging layer. But another way to think about it is that we're like the fast execution layer on top of Nomad.
So when you send messages through Nomad, it takes 30 minutes for them to basically become valid on the destination chain. Now, they're posted immediately. They're posted to the chain immediately, but there's a 30 minute delay within which anyone could come and dispute that message. Because they're posted immediately on both chains, you know if you're an off chain watcher, if your person is just slow observing the system, you know right away if fraud has occurred. The 30 minute window is not the amount of time it takes you to know if fraud has occurred, it's just the amount of time it takes for anybody to post that data to chain and prove fraud has occurred on chain.
What that means is so the way that Connext works is we actually have a network of off-chain actors. And what we do is we front capital and execute transactions faster. We basically short circuit Nomad's system. And this works in any user facing interaction, where Connext's routers, which are the nodes in our network, basically see a transaction that's going through Nomad, and similar to something Hop, they'll front the capital to the user and then claim against the transaction from Nomad.
And they're able to do this safely because off chain, they know if fraud has occurred right away, so they can know, okay there, well, this transaction is totally modified. In 30 minutes, this transaction is going to get to the destination. Let me just send funds to the user, maybe call Uniswap on their behalf or call whatever contracts on their behalf, send funds to them, execute that right away in under two minutes, and then at some point in the future when the 30 minute latency is up, I can claim against.
Kerman:
So then what is the risk that you guys incur? Because what's the trade off being made here is what I'm trying to understand. Yeah.
Arjun:
There's actually no risk to that, which is kind of cool. I mean, just as an implementation detail, there's risk for our routers who are holding assets because they're holding Nomad flavored assets, but that's fine. Those routers are acting as watchers anyway. So they're curtailing that risk by themselves policing the system. But aside from that, the risk is actually non-existent. The risk, it's similar to when people build fast exit systems for optimism, where you allow someone to swap into funds on Ethereum for optimism to bypass the seven day window. You're not actually incurring any risk when you do that, because the service provider that is that is giving that service or allowing me to swap is validating the transaction, validating the data from the rollup, and validating the proofs that have been posted to Ethereum. And as long as there's equivalent-
Kerman:
No, no, sorry, yeah. I think my question was wrong. It wasn't what's the risk, where does the gain come from here? It feels like intuitively somewhere, you're relying on the computational power of off chain and that computational power is available on chain, hence you can make that calculation. It's not fully clear in my head. So there's a new variable in this equation that allows you to have this without any risk, and I'm just trying to figure out what that is.
Arjun:
Yeah. So the question basically boils down to why does that 30 minute delay exist on chain, right?
Kerman:
Correct.
Arjun:
And the answer to that is because in the worst possible case, we expect that it would take up to 30 minutes for someone to send a transaction to chain to prove that fraud has occurred. It's basically to protect against the case where a malicious attacker tries to congest the chain or tries to censor your attacks, censor your fraud proofs, right? However, off chain you know right away. Yeah, it's like the chain itself takes 30 minutes to know, but everybody else-
Kerman:
Right, okay, cool. So that's the variable, because you've probably got some sort of mempool type system that's queuing everything and that fraud proof could get lost in the noise effectively. But off chain you can basically see everything because you have more computational power.
Arjun:
Okay. I guess, let's go into the details of what's actually happening, right? So when you send a message through Nomad, you post some data on the sending chain, say that's Optimism, and that data is for 10 USDC. So that message goes into Nomad and it gets signed as a hash on the sending chain, right? So you generate a hash of that message, it gets added to a Merkle root, and then the updater signs that Merkle root, and that gets posted on Optimism.
So there's a record on Optimism of here is a signed root that attests that the updater has said, "This is totally okay, no fraud has occurred here." Now of course, they could be lying at this point, but that's what they have attested to. Immediately afterwards, that root just gets picked up and relayed across chains. That happens through a relay system that, it's not super sophisticated. You basically just take that data and post it on another chain.
Now, the data has been posted to the receiving chain. The receiving chain just has received some blind data with a signature on it. It doesn't know whether that data is valid because it's not the sending chain. All it knows is that this data has been signed by the updater, so it's not possible for the relayer to actually manipulate the data, it's only possible for the-
Kerman:
Correct, yep.
Arjun:
Exactly. So that data is now available on the receiving chain, but it may be false. And what you do is you start a 30 minute timer, within which someone can come and prove that it is false. Now, if you're an external actor in this system, if you're just watching both of these chains, what you're seeing is on chain A, you see Kerman made a transaction of 10 USDC and that created this signed root. And on chain B, you see the same signed root, right? If the signed root is incorrect, then you'll know right away, because all you're doing is checking to make sure that hash was generated correctly. And if the signed root isn't incorrect, you can immediately know that no fraud has occurred. You don't need to wait for that 30 minute window. You can immediately know, and you can do something with it.
Now, if fraud has occurred, the idea behind the 30 minute window is I'm a watcher and I need to send this transaction to Ethereum, but maybe it's congested. And so maybe it just is going to take me up to 30 minutes to send that transaction or maybe it's possible that the bridge provider is explicitly trying to DoS the chain and censor me, and so it may take up to 30 minutes to send that transaction. So that 30 minute window is really just an amount of time within which fraud could be proven. The fraud could be proven immediately. It could be proven in-
Kerman:
Right, okay, cool. So I'm guessing you've used 30 minutes because there's some amount of theoretical costs to span the network for that long or some sort of economic value that's so ridiculously high that you can use 30 minutes as a safe benchmark as a time window.
Arjun:
Exactly, yeah. Yeah. There was a bit of controversy around that when we first started working with that number, and Nomad was the one that proposed it. But they actually did some pretty interesting analysis around this, where they found that the cost and just generally the odds of successfully censoring a chain for that long are incredibly, incredibly low. Economically, it makes no sense to even attempt it. It's kind of silly how ridiculously low it is. Now, of course everybody's like, "Okay, well optimistic rollups have a seven day timeout, why can optimistic bridges only have 30 minutes?" Part of that is the interactivity. So optimistic rollup fraud proofs and the fraud proving process is just way, way more complex and requires way more back and forth and things like that.
And in the case of Arbitrum for instance, and I guess Optimism has also gone in this direction, in order to maintain EDM equivalencies, in order to allow for their block sizes to be the same size as Ethereum blocks, they actually need to allow many multi round fraud proofs where you only prove fraud of a certain part of the block's contents at a time. And so you actually need to go back and forth many times.
Kerman:
So just way more computationally intensive?
Arjun:
And time intensive, right? Because after each one of those things, you have to wait a certain amount of time, and then you have to do the next batch and you have to do the next batch. Whereas in the optimistic bridge case, you're just sending a single transaction so that's a lot simpler. And then the other piece of that is that I think the seven day window was really just something that was... I mean we're part of the L2 research community. I remember when the seven day window was proposed, it was really just sort of a hand wavy thing where people were like, "All right, seven days, that's enough." It's a long enough time, it's going to be really difficult to attack Ethereum for that amount of time. It came from state channels disputes as well, where people were like, "Yeah, seven days is enough to handle a state channel dispute."
Now in practice, when you do the math, you have to think about, so with EIP-1559, in order to fill up an entire... So if you fill up the gas of an entire block, the gas cost of the next block increases by 14%. If you don't fill up the entire block, then you could get a transaction in and so then fraud has been proven. If you do fill up the entire block, the next block gas cost increases by 14%. So what that means is your equation to the cost that you pay ends up being ridiculous amounts money very quickly. Yeah.
Kerman:
Oh yeah, because 30 minutes, what? Like, let's say six blocks a minute, you've got 180 blocks, each increasing by 14%, is 1.14 to the power of... Yeah.
Arjun:
Yeah.
Kerman:
Right. That's so cool. It's funny. I find encrypted... There are all these kind of, I guess, notions that people just go with, but no one really questions why. And the more you question why about something, you realize it's just kind of made up just because.
Arjun:
[inaudible] made a decision, yeah. And I mean, it makes sense. A week is a long enough time. Everybody was like, "Yeah, there's going to be fast exits, it's fine. We don't need to worry too much about it." And they're right, they probably don't. But I think realistically, I mean, the multi round fraud proof complexity exists for optimistic rollups and there are probably trade offs that will always exist around that.
But if you had a simpler fraud proof mechanism for optimistic rollups, you could probably cut that down to one day or less even with no problems. I mean, Hop, for instance, as a bridge uses a one day exit window, but they just bypass the optimistic rollup exit window and they only use one day. And it's good enough. It is good enough.
Kerman:
So one thing I think about philosophically is how do you know something bad is happening? And is it always enough to know something bad has happened within 30 minutes, because you're effectively relying on machines and rough heuristical math to be like, "Yep, this roughly checks out." But sometimes there's things that aren't encoded into the rules of the machines to know if something has gone wrong. And in that instance, one week is actually nice because human beings typically know within that. I mean, technically we know about 24 to 48 hours, but 30 minutes now means you're purely relying on machines, which carries its own risk. So how do you think about that?
Arjun:
Yeah. I mean, the counterpoint to that is that when the Ronin bridge was hacked, it was days if not a week before anybody even realized, which is a really, really big problem. I mean, it's terrifying. But I think it's just like, no matter what happens, in all of these systems, you can't really... The human element is actually usually the fallible element, right? You need to have alerting systems and stuff like that no matter what, to be able to monitor this stuff. Because human beings on their own are not going to look at these things and be like, "Oh, something happened." They're just not, it just doesn't happen.
So yeah. I mean, I think the other piece of that is that with optimistic bridges, there are only certain kinds of cases that you really have to watch out for, which is basically some invalid state transition has been created either in the bridge itself or on the origin chain. So what you're checking for is 51% attacks or hacks of the contracts or fraud. That actually encompasses all the possible set of things. And all of those three things are not that difficult to check for off chain, as an off chain actor.
I mean, yes, in theory it's possible that there are code bugs and stuff like that that result in... like the off chain code that result in these kinds of problems, and that's definitely an issue. Generally downtime is not as much of an issue, and I think the reason for that is the mathematics actually are similar to the gas mathematics, where it's like, say you have a hundred watchers watching an optimistic bridge, which is a reasonable number.
In our case, we have 50 routers. So all 50 of our routers will be watchers. We also have another 150 on Testnet right now, so it's 200 watchers out of the gate. It's not an unreasonable number to have a hundred. Say you have a hundred watchers watching the chain and they have 70% up time, right? That means that in order to find a time where no... The odds of specifically finding a time where there are no watchers online is like 0.3 to the power of a hundred.
Kerman:
Yeah. And I mean, if you have this paradigm of watchers everywhere, it feels like just your phone or anything, any device that can just check something really simple is going to start having more utility in the future as we adopt this design paradigm of generally trust everything is okay but just watch to make sure nothing is terribly wrong.
Thank you for exploring through all those design spaces and whatnot, because I think you see a bunch of stuff on Twitter as like, "Oh, our bridge is better than yours," or all these very personal kind of things, but no one really goes through the specifics of the different design trade offs. So if we kind of zoom out for a second and we look at the state of the bridge industry, why do you think people aren't adopting this way of thinking about the problem? Or what would you say is a critic's view of this way of thinking?
Arjun:
I mean, I think there's still a large segment of people in the space that are just like, "Yeah, nobody cares about trust minimization, which I think that segment is usually very large during a bull market and tends to die out during a bear market. And I think that's a big part of what happened, was there was a bull market, bridging was a thing that became really hyped in the bull market, and so a lot of people built very trusted bridges because they were just like, "Let's capture as much market share as we can," and also just because the information asymmetry is massive, and so lots of people are like, "Yeah, we don't know anything about bridges. We're just going to use this UI and see what happens." Nobody knows if it's custodial behind the scenes, right?
I think now things are changing. I think it's actually more and more people are starting to recognize the value of trust minimized bridging, and they're starting to recognize that this is a huge, huge systemic issue for the space. And as I mentioned, already Synapse has pivoted to being an optimistic bridge. We've heard from a couple of other projects also that are in the process of doing the same, or at least remarketing what they're doing as being an optimistic bridge, if they haven't fully pivoted.
And then I think overall, I expect this space to continue in that direction. I think the bar, the threshold for security for these things is only growing. And what that means is that projects that either are dishonest with their users about the security of trade offs, or don't actively try to think about those and try to mitigate them are probably not going to do as well in the long term because people do care. End users don't care, but developers do, interfaces do. Nobody wants to be the next Ronin, nobody wants to be the next Horizon.
Kerman:
Completely. It's really funny. You see this with the stable coins, right? Users are like, "Yeah, the stable coin's great," but any project developer is like, "Yeah, your stable coin is not a real stable coin. You can fuck off." So yeah, it's really the developers, and I think it's, I guess, the builder community, whatever that means, to kind of make sure that we're holding up or being honest about what are the realities of the economic guarantees of whatever systems we're building and pushing onto users. Because if not us, users will chase the thing that always earns them the most money.
Arjun:
Yeah, Yeah. I mean, I think one of the key things that bridges unlocks, I know that we've talked a lot about bridge security and about just the landscape of the different kinds of bridges that are out there, but I think it's important to also contextualize this and why do these things matter, right? Is it just simply asset transfers across chains or simple kinds of communication across chains, minting an NFT across chains or things like that? And in my opinion, I think the reason that so many people are excited about bridges and why there's both so many high value projects and also so many low value projects contributing to this space is because, I think, the end state for bridges is that it will actually be one of the most key infrastructure pieces of the puzzle for blockchains.
And the reason for that is that what bridges and what cross chain communication is enabling is asynchrony. It's asynchrony primitive. So think about the way that you build applications on a blockchain today. There's sort of the AP computer science version of building applications, because there's just all on a single system, there's no asynchronous communication. You're basically just dealing with the state that is in the block that you're interacting with, and that's basically it. And so there are plenty of smart contract hacks, but in general, the sophistication, there's a lot they can do to become better at solidity and things like that, but the sophistication of the things that you build is actually relatively fixed, right? It's just things that exist within this block.
Whereas what bridges and cross chain communication allow you to do is basically have asynchronous calls to contracts and systems on other chains. And this is similar to making the transition to building web applications from building normal single computer applications, where when you're dealing with JavaScript or type script, you're now dealing with asynchronous calls that are basically remote calls through an API to some other service. And you don't know when that data's going to come back, you don't know what's going to come back, but you have a call back so that when it does come back, you could go and execute something.
This is what we are trying to push towards, this is what we're enabling. So with Connext and Nomad for instance, we're in the process of doing a network upgrade that basically tightly couples Connext to Nomad together and then enables fully expressive cross chain application [inaudible]. And what we see is that the future of this space is going to be applications that actually touch multiple chains all at once. It's just applications that are just interacting with resources everywhere all at once, just like web applications.
And the core primitive that we build for that is basically called xcall. It's a lower level solidity primitive that maps you the lower level of the call function. But call basically lets you call a contract on the same chain. You just asked them by code and the function signature, and then you just call a contract. Xcall, lets you call a contract asynchronously on a different chain, so-
Kerman:
That is sick.
Arjun:
Basically it's a call. And then, you can also attach a callback to that. So once you xcall something on a different chain, you can get the data back and then complete a callback. Yeah.
Kerman:
Oh my God, you guys are mad geniuses. That is sick. Yeah, it's like, I feel a really simple marketing slogan for you guys is "We're enabling JavaScript for blockchains."
Arjun:
[inaudible].
Kerman:
Yeah, and when you kind of zoom out then, I think the whole notion... So [inaudible] kind of posits, you can't have a cross chain future, you can only have a multi chain future. So I would think that you think that it is possible to cross chain future, but it's just maybe a bit more slower than we'd like it. It's still going to be a bit rough, at least in the current state of block, you're only as good as the consensus slay of the chain that you're building on. But as chain consensus becomes better across the board except for Bitcoin, because Bitcoin is never going to change, we'll have these asynchronous calls.
So when you think about the compute stack of blockchains, it's not going to be like you're interacting on layer three that interacts with layer two that interacts with layer one. It's more of like, you're probably going to have some signed message that simultaneously is probably... Well, it's executing something on one part of the stack and then that stack is going to communicate with other systems, they're all going to be networked to together. But it's not hierarchical, it's more of composable in some sense. So how far away do you think we are from that future?
Arjun:
So for we built this primitive and we have some ideas on how it could be used, we have a lot of projects already kind of building with it. But the difficulty is that we don't yet know... In general, this space doesn't yet know a ton about how to deal with concurrency across blockchains and how to think... We have existing understanding of concurrency control and stuff like that from normal distributed systems, but with bridges and blockchains, you have very high messaging latency. So things like locking systems and stuff like that, don't really work that well, because those costs too much to implement on a blockchain. It's just too complex. So I think something that we need to figure out is what is it actually going to look like, what are the right kinds of design patterns, what kinds of projects are actually going to succeed in this paradigm, and how should they be built.
So I think we're very close to the point now where many developers are going to start experimenting with these primitives. But I think we're actually surprisingly, probably about a year or two out from when we actually see real cross chain applications taking off because I think in that time it's going to be a lot of experimentation to figure out what kinds of structures actually work and stay resistant to MEV across chains and don't result in state collisions. Because you don't have to worry about that on a single chain, right? When you're building a solidity application-
Kerman:
It's synchronous. Yeah.
Arjun:
Yeah, it's synchronous so you're you don't have concurrency control, you don't have to. For better or for worse, I think a lot of solidity developers are relatively newer developers. They're coming from a web dev paradigm where they've worked for a couple of years, or a lot of them are just people that are just coming straight into this industry and they're learning solidity as their first programming language or one of the first programming languages that they know, and they don't have a ton of experience with concurrency and things like that. So it's going to be a big learning curve for those folks, for sure.
Kerman:
Yeah. It feels like we'd almost need... When you think about all the people who've designed really primitive databases, like Redis, MongoDB, those really kind tried and tested engineers who understand concurrency and the problem with locking databases and the trade offs, you really need to know, like feel a database. And once we have a trove of those people coming in, would probably accelerate the space. Those people are probably 40 to 50 and they probably don't care about blockchain, so it feels the inevitable pop is we're going to have to make all the mistakes as this generation. Then once enough people have made those mistakes, then we'll be able to truly understand that paradigm.
So it feels like the limiting factor here is really just the understanding of the talent that's coming in, which actually comes down to developer education and it's not like a, Hey, you just seem to ship this [inaudible] front end fast and it's going to work. It's like, no, you actually should think about this from a first principles perspective, understanding your design trade offs at every frame of execution, and then think about the system in whole. And if that's a lot more thinking than writing code, which I think today's developers aren't really used to.
Arjun:
Yeah. Yeah, absolutely. I mean, I think it's good that we have this slow process to build solidity applications with auditing and things like that. Auditor's jobs are definitely going to get a lot harder once this happens. I mean, even talking to our current auditors that are auditing our system, that are starting to get more familiar with how to think about audits of systems that are touching multiple chains, they are like, "Wow. This is a whole new paradigm of complexity that we just haven't considered," right? Concurrency control is just something that was completely off the table, and there are going to be a lot of concurrency related bugs and hacks that happen, which is going to be quite brutal.
Kerman:
Who is so specialized at understanding consensus mechanisms, understands solidity or EVMS, and then understands concurrency? It's like, you're speaking of less than a hundred people globally by this point, and the kind of stake of money at line on the table here is absolutely insane. So-
Arjun:
It is.
Kerman:
I guess the question is, I feel it's not going to be possible. If you kind of abstract out a ton of layers, it's not going to be possible to get people with those specializations. So feels like there need to be better designed insurance primitives, where it's like you just don't really have to care about the audits and you just know there's some financial profit modeling that makes sense even if everything was to go to zero. Because relying on auditors just doesn't feel feasible moving forward.
Arjun:
Yeah. I mean, I think auditors are super important, and getting a security audit is a really important part of the process. But yeah, I mean, more and more, I feel the incentives are kind of whack. What really should be happening is you should be ensuring the contracts and then the people that are basically underwriting that insurance should be the auditors that are now putting their money on the line there by making sure that this is secure because now you're incentivized to make sure that it's secure. I don't know if such... I mean, obviously that model doesn't exist yet. I don't know if it will exist [inaudible]-
Kerman:
Have you heard of Sherlock?
Arjun:
Feasible, but it's definitely interesting. I haven't.
Kerman:
Yeah, so they're actually doing something similar. I mean, I [inaudible] angel invest [inaudible] a while ago, but I'm [inaudible] because that's just actually their model. And it's something that I've thought a lot about because it almost feels like in the future, if you have some sort of protocol or application, they're going to try... Let's say a 10% fee in some capacity and then 2% is your insurance cost, and it just looks a traditional business of like, yep, our liquidity costs are 2%, our insurance is 3%, our net margin is 5%. Cool, this is how we make profit. Here's how the unit economics are modeled out. And that to me feels like the future, but it's not like, oh yeah, we waited four months and we spent 200 grand to get an audit, but there's no guarantees that the thing is safe, use at your own risk. That just feels like [inaudible]-
Arjun:
Rubber stamp, yeah. It does, yeah.
Kerman:
Exactly. But anyway, dude, this was a ton of fun. Thank you so much. I really enjoyed this. And yeah-
Arjun:
Likewise. Thank you so much for having me.