Chaos Lever Podcast

Ned and Chris talk to Doug Madory about changes in BGP since the mid-1990s.

The More Things Change, the More BGP Changes a Little Bit

Ned and Chris dive into the evolving landscape of BGP with Doug Madory, the Director of Internet Analysis at Kentik. Despite the rapid transformation of the internet since the mid-1990s, BGP remains largely unchanged, leading to a rise in routing hijacks and user errors. Doug discusses how automated filters and cryptographic tools like RPKI ROV are mitigating mistakes and improving security. He explores the potential of BGP solutions in reducing global issues and the importance of initiatives like ASPA. The guys also get Doug’s take on significant events like the Allegheny/Verizon incident and the FCC's ongoing efforts to enhance BGP security.

Links

Kentik: https://www.kentik.com/
Twitter: https://x.com/DougMadory
LinkedIn: https://www.linkedin.com/in/dougmadory/
Kentik blog: https://www.kentik.com/blog/

What is Chaos Lever Podcast?

Chaos Lever examines emerging trends and new technology for the enterprise and beyond. Hosts Ned Bellavance and Chris Hayner examine the tech landscape through a skeptical lens based on over 40 combined years in the industry. Are we all doomed? Yes. Will the apocalypse be streamed on TikTok? Probably. Does Joni still love Chachi? Decidedly not.

Ned: This is going to be a little bit of a different show, isn’t it, Chris?

Chris: Why?

Ned: Because we’re not alone. There’s someone else in the room.

Chris: Do I need to call an adult?

Ned: Are you not an adult?

Chris: Have we met?

Ned: No, that’s fair. Hello alleged human, and welcome to the Chaos Lever podcast. My name is Ned, and I’m definitely not a robot. I’m a real human person who needs to monitor their dihydrogen oxide intake and output to make sure it stays within normal tolerances. Just like you. I definitely do not consume the blood of humans to stabilize my quasi-organic internals. That would be vampirically ridiculous. With me is Chris, it was also here. Hi, Chris.

Chris: I mean, when you say things are ridiculous, it always sounds like you don’t mean ridiculous.

Ned: Maybe [laugh]. As I alluded to, we also have a guest joining us, which is very exciting. Our guest is Doug Madory. Is that how you say your last name, Doug?

Doug: Yep.

Ned: Man, I got it on the first try. That’s awesome. Doug is the director of internet analysis at Kentik, and he has definitely forgotten more than I’ve ever learned about BGP routing, and the internet. Welcome to the show, Doug.

Doug: Hey, glad to be here. Thanks for having me.

Ned: Absolutely. A few weeks ago, we talked about BGP a bit, and we established that—well, if you haven’t listened to it, probably go and listen to it, especially if you’re not familiar with BGP at all—but it’s a protocol that was initially designed when the internet was a small, friendly place full of nerds who just wanted to get things connected. Everyone kind of knew everyone, and asking others not to be assholes about it just kind of worked at the time. And that’s true of so much of the early internet. SMTP, HTTP, FTP and other protocols didn’t have security, even as an afterthought.

Essentially, everything was sent in plain text and trust was just assumed, and that was the world of the mid-1990s. Thirty years later, the internet is a very different place, and somehow BGP is… largely the same? So, let’s talk about that. Doug, so BGP was originally based on building relationships between neighbors, exchanging network layer reachability information, and it all had this assumption of trust. When did that start to become a problem?

Doug: So, I mean, it’s still inherently the same, as far as that goes. I started in this space in the year 2009, so I’ve been doing this for about 15 years. In that time, there’s been any number of cases of either, like, deliberate routing hijacks in order to disrupt or misdirect traffic or, more frequently, mistakes. There’s a lot of people will fat-finger on a router causing a large internet outage, and a couple of the problems that we deal with in the BGP world.

Ned: Okay. So, you’ve sort of—there’s two different, I guess, things that people have to be worried about, that you brought out there. One is, like, malicious activity, trying to mess with neighbors, and redirect traffic in some way, and others are just, like, “Man, I had a—it was Friday. I was trying to get out. I hit a command, and uh-oh, I’ve blown up half the internet.”

Doug: Yeah. I mean, that was pretty frequent. I would say it was a little too frequent when I was getting started in the space. And I would say we’ve made a lot of progress on that category of building some belts and suspenders to try to improve that side of the problem. What we in this space call the other side is the determined adversary. So, someone who is an attacker who’s very knowledgeable of what security mechanisms have been deployed, how they work, what are their weaknesses, and so the determined adversary is kind of an unsolved problem thus far in BGP.

Ned: Okay. So, you mentioned that there’s some things that have been put into place to help with the fat-fingering problem. What are some of those controls or ways that we’ve tried to fix that half of things?

Doug: Yeah, sure. And when I talked to audiences about this topic, I like to say that BGP security, or this whole topic it’s not a one thing. This is a constellation of problems, that we have a variety of different things that can go wrong, it’s going to require a few different solutions, and there’s also a spectrum of difficulty. At one end, are kind of the boneheaded errors that hopefully we could come up with some automated ways to prevent them from causing disruptions on up to the determined adversary I mentioned a minute ago. But yeah, we can mention some of the things that people have—networks are using.

A lot of it has to do with how do you filter the routes that you accept. So, in your previous episode, I’m sure you kind of went through this process of this route by rumor. An AS will accept routes from an adjacent AS to try to learn how to reach other parts of the internet, so you’d like to have some sort of quality control over the routes that you accept. And in that genre of mechanisms, you have the very coarse thing of, like, we call maximum pref setting. So, if I normally get ten routes from you, I shouldn’t tomorrow suddenly start getting a million. There’s probably a problem. And if it does, it should maybe kill the session or take some sort of an action.

So, that was one of a number of, like, really simple mechanisms that we—I don’t want to take credit for it, but the industry adopted early on. And then there’s filtering. There’s a couple different ISPs we use to filter the routes that they receive from their customers. Depending on the complexity, like, maybe it’s a really easy case, they know is this going to be a couple ranges, they can put this into the configuration, they should only receive this type of route from a customer. But as you go up the stack in the internet hierarchy, if you’re going to larger companies, it’s going to be very hard for them to know what are all the possible routes that could come through another layer.

If you’re, like, Tier 1 is accepting restaurant Tier 2, that Tier 2 can have a lot of different customers, a lot of different types of routes, so then we need an automated process to build those filters. And so, we use, we call IRR, Internet Routing Registries, to build these, to store information. And so, in that, we’ll say—there’s a couple different mechanisms there, but it’ll essentially try to whitelist what are the routes that would be acceptable to be received from a customer, and when you receive something that is outside of that list, then reject it.

Problem is there is the there’s, like, 30 of those. There’s no single truth that everybody agrees upon. It can vary. Some of them have had some security issues, like the registry data itself has been a target of an attacker who had found a way to put bad information in that would enable them to announce routes they’re not supposed to announce. And so, the IRR area is something we use. It’s widely used, but we’re hoping to try to get past that and use things like RPKI ROV.

So that’s, kind of, the technology du jour right now. And let me explain a little about what that is. So, RPKI is a Resource Public Key Infrastructure. So, this is a cryptographically secure and enforced platform that ISPs can build services—use services that are built off of RPKI to perform route filtering. And so, ROV is Route Origin Validation is one of hopefully—hopefully, there’ll be more is the vision, but ROV is the first one. A lot of times in this space, people say RPKI; they’re really referring to RPKI ROV because that’s the one application that people are actually using.

So, Route Origin Validation, the way this works is that address, or we call it a resource holder, or the person who owns the address space, would—typically they do this through their RIR. So, I’m using a lot of acronyms here, but if you are in North America, your RAR is [unintelligible 00:07:54], and you would log into the [unintelligible 00:07:56] portal, you would have the account login, if you are the owner of that address range, and through there, you can assert—you can build a ROA—[laugh] another acronym—a Route Origin Authorization, to say, what is the AS that is allowed to originate this address range? And there’s an expiration date, a max prefix length, there’s a few other details there, but mostly this is what’s the correct origin? Now, that information then gets published out to the internet, and every entity in the world that is rejecting RPI invalid routes will then use that information to determine, when they receive a route, they would check the AS path and look at the rightmost AS in the AS path—that would be considered the origin—and see if that matches the origin listed in the ROA, the Route Origin Authorization that’s stored in RPKI. So, hopefully you can follow that [laugh], but you know, the benefits there are, you’ve got all the information is cryptographically enforced, you can’t force some bad information in the path here, it is one ground truth for the world, so we don’t have that doubt or uncertainty of which document are we going off of.

This was discussed for many years, and there was advocates and debate around this. And eventually, it finally took hold. About four years ago or so, we started seeing adoption. And for a while the issue was trying to deploy globally a security mechanism on the internet is a really hard thing, right? This is—there’s no money in it for anybody. Everyone doing this is trying to do is for the benefit of the rest of the internet.

So, to get people to do it, there’s two steps that a network has to perform in order to have deployed RPKI ROV. So, they would create ROAs for their address base to essentially communicate to the world what’s the correct origin, and then they also need to reject RPI-invalid routes that are coming through their network. But for a while, we have the chicken or the egg where, why would anybody bother creating ROAs because no one’s rejecting invalids? And why would anybody reject invalids because no one’s creating ROAs. Well, we’ve somehow managed to get ourselves past that chicken-or-the-egg phase.

And probably the biggest facilitator was when we have Tier 1s, back in the year 2020, start rejecting invalids. And I’m talking about, it used to be called Telia, now it’s Arelion, Lumen, Cogent, GTT, like, these really big global, the top of the internet telecoms, they have huge downstream customer cones, and cast a wide shadow, and so when they do something, it has broad effect. So when, in that year—it was a year that those networks started rejecting invalids, and that really started the ball rolling, in my opinion, and we can see that if you’ve tracked these things through time, you can see there’s an inflection point around that time, where people started creating ROAs because they feel like someone’s going to actually do something about this. And on up to just in May this year, we cross the arbitrary milestone of getting past 50% of the routes in the global IPv4 table now have ROAs, and are essentially eligible for the protection that would RPKI ROV would offer. IPv6 have reached that milestone last year, probably due to the fact that it has less legacy stuff to deal with.

But anyway, so we look at this as success, and getting ourselves on the right path. However, I want to be careful. People who are work on this topic try to choose our words carefully not to overstate what RPKI ROV is going to do for anyone because it can be defeated. What it is most successful at is suppressing routes that are due to misorigination. So, if someone has incorrectly—whether deliberately or not—originated address space they’re not supposed to, then essentially the system will just suppress those routes, and it will reject those invalids.

Like I said, it isn’t foolproof, and we’re going to need more mechanisms to try to go up the spectrum and push the needle on up to the determined adversary to try to secure that side. But we had to start somewhere in this space, and it’s as you can imagine, the internet’s a big place, there’s a lot of different companies and people working here, so to pull off a voluntary adoption of a global technology is a non-trivial thing. So, I’ll stop there, see if you have any questions.

Chris: So, one of the biggest things, I think, if anybody knows anything about problems with BGP, the things that make the news are like when countries make a mistake and accidentally transfer 75% of the internet’s traffic through Pakistan, or through China, or accidentally knock YouTube offline for 25 minutes due to routes that become completely invalid, and are transmitted out to the entire world. So, these are not necessarily adversarial attacks, right, but these are people that have rights to publish global tables, right? Does any of the stuff that you’ve talked about interact there in any way to help or mitigate with problems that could be propagated worldwide from what would be considered a valid Tier 1 endpoint?

Doug: Yeah, so you brought up the Pakistan YouTube incident, which is probably maybe one of the more famous BGP incidents that have ever occurred. It’s worth reflecting—it’s probably worth a topic on a blog post—of, like, if the same thing were to happen today, how would that be different? Because it would really—it’s really not possible what took place. So, just a backstory here. It was—this occurred, I believe, 2008, where there was a video on YouTube that was deemed anti-Islamic, and so the government of Pakistan gave an order that they needed to block YouTube.

And so, this came down to—BTCL is a state telecom with Pakistan—they decided they would do this via BGP. And so, they would just attract all of the BGP—they create a route of YouTube address space, try to attract all the traffic that was going to YouTube, and put it in a bit bucket when it comes. So, that part, it was intentional. What wasn’t intentional was that they announced this, accidentally, out to one of their international transit providers, who carried it out to the global internet. And so then, because it was a more specific, like, a BGP prefers more specific, longer prefix length routes, it became very popular, and about two-thirds of the internet, for I don’t know, a couple hours was believing that they need to go to BTCL in Pakistan for YouTube. That made YouTube unreachable. Pakistan also wasn’t doing great either. This is—they were not used to get in that kind of traffic. Yeah, so ultimately, that was resolved. I think Google had recently purchased YouTube, they intervened, got the international carrier to stop carrying the route.

But if we looked at that case again today—I call these accidental but also intentional, or unintentional but also accidental. Because we had this happen recently. There was a couple of cases that are worth thinking about, where—a similar case in Myanmar, in let’s see—I’m trying to get my dates right; I guess it was spring of 2021. There was a military coup in Myanmar, and there was a lot of government involvement in suppressing communications. So, either was like a total shutdown. There were mobile, like, nightly shutdowns. There was all kinds of different things. Everything we’ve ever seen in the digital [rights 00:15:02] space was happening over a couple of months.

They even had their own Pakistan YouTube incident, where they had given an order out to the ISPs to block social media. One of the ISPs in Myanmar decided they would do essentially what BTCL had done, in 2008, and they would take Twitter address space, they would basically announce it locally, was I think their plan, and just drop that traffic when it comes to announce this out to the internet. And so, around South Asia, there was a Twitter outage where all this traffic was getting directed to Myanmar. And then fast-forward one year later, almost a year to the day of that incident, the exact same thing happened to the same address range, same prefix, everything, and this time, it was out of Russia. So, Russia had invaded Ukraine, there was a backlash, and within Russia, they started cracking down on independent media and social media, and we had that same thing: an ISP created a BGP route to block Twitter, accidentally announced it out to a transit provider, but the difference was the between the spring of 2021 and spring of 2022, Twitter—now X—had created ROAs for all their address space.

So, this route now had a ROA, and routers all over the world would know when they see the one coming from Russia, this isn’t the right one, and they would reject it. Now, that doesn’t help the people in Russia, but it at least contains the disruption to that area, and we just, we’re not going to be able to get—we’re going to be able to intervene beyond that. Anyway, so that’s a good story of that growth. So, if BTCL today was to announce those YouTube routes, they would probably go nowhere, and they would probably affect no one.

The other thing is that YouTube isn’t pulled across transit providers anymore; it’s served through embedded caches in your ISPs in nearly every country in the world. There’s really very few exceptions to that. So, even if the RPKI wasn’t there and the route got out, I don’t know that it would even mess that much—it’d be worthy of debate of how much impact it would really have. It certainly would be significantly less than what occurred in 2008, just to how the internet has evolved and how content is delivered. But within the routing space, it also kind of can’t happen, or at least it will be limited. I wouldn’t say there’s zero impact. But RPKI ROV, like I said, it’s good in those cases. This is where it’s strong.

And then in recent years, where we’ve seen a lot of activity in the determined adversary category is against cryptocurrency services. So, these are great targets for hackers because if you can crack one of these places and steal the money, you can have it immediately, and launder it, you’re gone and there’s no recourse. So, it makes for good targets. And so, we’ve seen some specific attacks that involve BGP hijacks, including one that did a hijack against Amazon—so this is not a mom-and-pop shop; this is a one of the most well-resourced networks in the world—that did all these things. It defeated RPKI ROV by forging the AS path so it would be seen as valid, so the route—the bad route—would get circulated. It created a fake entry into AltDB, one of the IRRs that are used for automated creation of filters.

And so, there was a lot of lessons learned there. But some of these cryptocurrency attacks—I wrote up a piece; maybe we can put it in the [notes 00:18:22] for the episode—just looking at, like, this is where people in our space are trying to spend more time thinking. And so, that’s progress, where we’re less worried about these fat-finger cases, kind of, covered as far as best as we can get them, and now we need to focus on the determined adversary scenario.

Ned: So, I read through that article that you’re talking about, and I found that one example you gave of the cryptocurrency and the way that they had used all the infrastructure you would expect to, sort of, spoof where the routers were coming from, and then make it all look legitimate so traffic would go there and be none the wiser, and then they would just, you know, steal your bitcoins or whatever, and abscond with them. Are there some new standards or things coming down the pike to also protect against that sort of scenario?

Doug: Yeah. So, there’s always more. The next thing that is being pushed or advocated for is something called ASPA. So, that doesn’t deal with the cryptocurrency attack scenario, but… Autonomous System Provider Authorization. So, the way this works is, each network which is an autonomous system is going to, within the system, assert what are its transit providers. And in the same way that we create a ROA, and this will be information that’s stored within the RPKI platform, and cryptographically delivered everywhere.

So, then by asserting what are your transit providers, this enables other networks to look at an AS path and detect what we call a valley-free violation. And so, maybe I’ll explain a little bit what that is. So, in BGP, I don’t know if you got into this in your last episode, but there is hierarchy to the whole thing. So, it’s not—I think, maybe in a textbook, it might look like it’s a little amorphous, everybody just kind of connects to everybody else or—but there’s a hierarchy to it. So, for the most part, you have networks that are buying transit from other networks.

So, you get to the top, there’s a default-free zone or transit-free zone. The top of the internet is kind of a cabal there of a dozen or more ISPs that don’t buy servers from anybody; they just sell, and then they connect to each other. And so, as traffic goes across the internet, it’s either crossing these transit edges, or the alternative is a peering relationship, and these are the two classic types of relationships. What you can’t do is—your traffic always going to go over a hill. So, you’re going to be going up the transit links until you get to a top, and then comes down the other side to the destination. Or maybe you’ve got a way to kind of cut through it with a peering relationship, but you can’t go down.

Because if you go down, when we draw this on diagrams, we say, you know, transit is up. This is how we draw it, and it’s the mental model. It may not translate in the podcast very well, but—so then if you’re going down, basically you’re drawing a line. You went from a provider to a customer, back up to a provider, then that customer is paying both sides to send that traffic. It’s all about money. It’s as much about technical stuff, as it is around business.

And so, when you and I have internet connections at our house or phone, for the most part, they’re kind of all you can eat: you pay some sort of flat fee, and unless you do something crazy or you’re hosting… whatever stuff, you don’t worry about how much you’re using. But in the wholesale market, it’s by bit, so by volume, you pay by volume. And so, then they’re trying to either reduce—the providers are trying to either reduce costs or increase revenue, and one way they reduce costs is by peering so they don’t get around to the transit providers, but if you’re going from a provider to a customer or back up to a provider, then that customer is paying both legs, and is receiving no money, so that’s something you would never want to have happen, and people go to great lengths to try to avoid that because you’re basically paying twice for something you’re not getting paid for. So, that’s a valley in this valley-free terminology. And so, that does happen, but it’s usually a leak, like a mistake. So, an AS has taken a route from one side and send it to another by mistake.

Ned: We covered that a little bit in the last episode because I talked about what happened with the Allegheny provider where it was using Verizon and… DQE or something, were the two providers that it was using, and it had accidentally leaked routes from Verizon out through DQE, telling everybody, you know, send your traffic through me, basically. And so, that would have been a big valley.

Doug: Yeah. It was a great example of a valley. So, in that case, yeah, Allegheny is a customer of both Verizon and DQE. The routes coming from DQE to Allegheny to Verizon. The vision is that can be picked up and just blocked immediately had, you know, Allegheny asserted in RPKI using ASPA who are its transit providers. People would look at that be like, “Okay, somebody made a mistake. I won’t carry this.”

The incident, as a few lessons learned. Obviously, Cloudflare made a big stink about it because they got affected, rightly so. And it did prompt a discussion around RPKI ROV. But what’s interesting to me is that RPKI ROV would have helped in that case, but not because of the origin being changed. The origins for the routes that were leaked were actually intact. So, like, Cloudflare is 1335—as somebody who works with this stuff, I have, like, thousands of these ASNs memorized—so the AS origin, the rightmost AS, [unintelligible 00:23:42] paths, it was correct.

So, by checking the origin, it wouldn’t have filtered the routes. But because the [unintelligible 00:23:49] DQE was using a route optimizer, which locally creates these more specific routes to try to do traffic engineering, those more specific routes then were what really attracted the traffic, but they also would have been RPKI invalid because the routes of Cloudflare, and I think Akamai was in this as well, they had ROAs that set a maximum prefix length. So, for prefix, we call a prefix is a BGP route as an address range, and the address range has got, you know, a network portion and a host portion, and then the prefix length then sets what’s the network portion. And then, in our parlance in routing, we just talk about prefixes. But in that case, those prefixes would have been invalid due to the max prefix length setting in the ROAs.

So, I also bring that case up because that’s probably the last really big debilitating routing leak that’s occurred. And so, that was, I think, 2019. We’re more than five years out, which, when I give talks, I was like, it’s not an accident that was the last. I mean, we may have another one. While we’re speaking right now, it could be something disastrous could be happening, but we’ve gone a long time.

Five years is a long time in internet time, and so things have gotten better due to a lot of these things: the filtering, we talked about max prefix length. ASPA really is just getting started, so I don’t think we’re going to see benefit from it for a little while, but RPKI ROV is helping. Yeah, there’s just a variety of different—we call it routing hygiene, just all these different things that providers do. And there’s no… there’s a lot of best practices.

And MANRS is a organization—that’s Mutually Agreed Norms for Routing Security—is kind of the industry’s advocacy group for enumerating, what are the things that networks need to steps they should take to improve routing security, routing hygiene, and they’ve been very instrumental in being the go-to, and being good advocates for routing security.

Ned: So, MANRS is sort of like an opt-in, right? You want to be a good citizen, you want to have good manners?

Doug: That’s correct, yeah.

Ned: I know recently the FCC has been trying to push being a little more stick and less the carrot when it comes to BGP security, so what is the FCC trying to do, and how is the industry responding to what they’ve been pushing?

Doug: Yeah. So, back in 2022, the FCC—this is a Federal Communications Commission, a US agency overseeing our telecommunications sector in the United States—decided to get involved, or get—figure out what leadership does it need to provide in routing security. Obviously, this is something that affects the United States, and there’s a national security element to this as well. They began a process where they were seeking inquiry asking industry experts to ask them what they think they should be doing. And they’ve been a bunch of events that they’ve held and documents they’ve published.

And then this year, within the last couple of months, they published what are—they’re seeking comments, really trying to get elicit feedback from the industry—and they’re getting some—of, like, what should be the rules they require of US telecoms? And so, what they did was identify a list of nine telecoms. They call them BIAS—always another acronym—

Ned: [laugh].

Doug: It’s not an acronym I’d seen before, but there’s always a new one, that these nine telecoms need to deploy RPKI ROV. So, there’s two things that they need to do: that they need to create ROAs—and there’s a way we can all measure and see that they provide these. So, BIAS stands for Broadband internet Access Service—and the companies are AT&T, Altice, which owns Suddenlink, Cablevision, Charter, which is also Spectrum, Comcast, Cox, Lumen, which I don’t know if everybody knows is both a regional [unintelligible 00:27:41] provider and then also plays this role in the global internet, T-Mobile or mobile operator, TDS, which now owns or is in the process of owning, US Cellular, Verizon; anyway, those are the nine—that each of these companies needs to create ROAs for their address space, at least 90% of the routes that they originate, and also reject invalids. And so, they’re seeking comment on this. And so, what I decided I would look at was, all right, well, now that they named these companies, let’s see how they fare today, and I just ran through the numbers.

At least on the ROA creation side, it’s very tricky for us to remotely figure out to what extent, if at all, they’re rejecting invalids. Different people have come up with different methodologies; it’s still kind of an open research area. But five out of the nine would do very well right now, or they’re probably already at 90% of the routes. And I work at a company that deals in large amounts of NetFlow, so we work in the service provider space, so these are companies that we would work with, and so we have a lot of NetFlow that gets shared to us as part of the service that we provide, and so we have a nice slice of the internet of just the traffic that’s going across it for analysis and study. This is what I spend a lot of my time doing.

And so, I looked at the traffic that we see going to these large US telecoms, and how much of it was going to routes with ROAs, and for a few of them, like T-Mobile, and Cox, and Comcast, and Charter Spectrum, it’s nearly universal. Almost every packet going to those networks are going to routes with ROAs, meaning that they’re eligible for protection. That’s one side of the internet transaction. That’s the traffic coming back to the user. The other side is not really covered in this proposal, which would be what are they going to?

What are they sending their traffic—where are they requesting data from, which could be bank or cryptocurrency service that’s under attack, or—that side of it, they’re not getting into, although I suspect that may not be where they focus their time. They’re looking at what are the rules that the US telecoms need to follow. Yeah, so I went through this, and yeah, it’s generating a little bit of discussion here in our community of just, for one, whenever you do something like this, you can’t get it all right. I tried to be explicit about what AS’s I was using for the analysis. I’m getting some feedback I missed a couple. I’m happy to update it, but some of these companies use dozens or AT&T has over a hundred AS’s that they use, so I’m trying to manage the complexity of that and still provide some analysis.

But, you know, in this proposal, The Internet Society, which was the former home of MANRS, which he talked about earlier, which since has moved to the Global Cyber Alliance, those two entities wrote up a joint ex parte response to the proposal by the SEC pushing back pretty strenuously against a federal requirement that telecoms adopt RPKI ROV. And the points they made were that it’s dangerous in security to legislate you have to use this particular solution because now everybody has to do that thing, and maybe it becomes obsolete or things have changed, and now it’s counterproductive, and you’re stuck complying with this rule. There’s a concern there of just ossifying some kind of a requirement. And then also, I guess, smaller providers were kind of excluded from this first pass. They made the argument, the small providers, this would be a cumbersome or burdensome thing to comply with.

But, you know, the first point they made was using the analysis that myself and this other expert in the space that we’ve kind of collaborated quite a bit on this topic, this guy [Job 00:31:10] Snyder, who’s at Fastly now, is probably the leading voice in routing security, has been for a while. In fact, a lot of the progress that we’ve made is directly attributable to his work. So, he and I worked together quite a bit on this, and both the FCC rules and ex parte response pushing back on them both relied on our analysis that showed that we have a lot of ROAs that have been created and in fact, those ROAs represent the majority. Now, it’s a super majority of the traffic that’s exchanged in the internet is going to routes with ROAs. And then conversely, routes that are deemed invalid just don’t get propagated as much. In fact, they were suppressed quite a bit, so the system is working as designed, and we’ve reached a point of adoption where the next network to do these things, to create ROAs for their address base to start rejecting invalids would have immediate benefit because there’s been so much adoption thus far, that work was highlighted in both FCC document as well as the pushback.

Their point was, look, the industry has already made a lot of progress, and there was no government mandate. I mean, that’s a pretty good argument because all this progress I’m describing, there’s no government mandate, it’s just simply advocacy work within the communities around the world, and getting it to a point where there’s some peer pressure, if not shame, to motivate networks that aren’t doing the stuff to do it. And if things change, and there’s a better solution, and this needs to be abandoned, then so be it, but right now, this is—it didn’t require any government intervention, and so that’s part of their argument. I think there’s a lot of people who, certainly in the industry, that will be their take. We’ve actually done a pretty good job without any government rules.

And folks in our space, even if we are sympathetic to the issues that are being brought up by the FCC, are a little leery about codifying something. And how hard would that be the change, and what are the unintended consequences of that? I think people worry and wring our hands a little, rightly so. But anyway, that’s the kind of the latest in that. And we’ll see how this goes. I think there’s a few more days as of the recording. We’re recording this on July 12th; I think you can submit—this will probably be published after this deadline, but on July 17th, I think, is the deadline for submitting your pushback. I’m sure people are writing their opinions.

Ned: Yeah. We’ll be one day after that for publish, so you—

Doug: Oh, well. Well [laugh].

Ned: —you just missed it [unintelligible 00:33:30] [laugh].

Doug: Too bad.

Ned: Yeah, no, I echo your concerns because I know anything that goes into a government regulation tends to ossify, and so now you’re stuck with a very—if it’s written in such a way that it’s a very specific technology, you’re just stuck with that until the lawmakers get around to updating it. And we covered a story in our lightning round this week, all around how Japan has finally gotten rid of using floppy disks in the government agencies. And the reason they were using floppies is not because people want it to; it’s because there were very specific regulations on the books that required them to use floppy disks of a specific type and size. And so, they’re like, “Well, that’s what we have to do.”

Doug: Wow. Yeah, so then we have an added complication to that whole discussion is the Supreme Court decision that overturned Chevron Deference. So—

Ned: Right.

Chris: —I am not a lawyer. I don’t know about you guys, so I have a—

Ned: Not at all [laugh].

Chris: Very layman’s understanding of this, but essentially, it would curtail what a regulatory body can do on its own without this being explicitly laid out in legislation from Congress. And that means—this is not clear to me again, as a layperson, can they make these rules now? Is this considered under something that’s presently in legislation? Because if it’s going to require Congress to be involved, oof, then all bets are off, I think [laugh]. We’re definitely better off just letting the industry to take care of itself. But anyway, directionally, I appreciate their concern. This is an area that is worth trying to improve, but yeah, we just have to be careful not to create something counterproductive.

Ned: Yeah, absolutely. Well, Doug, we’re coming up on time. Before we say goodbye, where can people find you on the internet if they want to know more? And can you tell us a little bit about what Kentik does as well, as your employer?

Doug: Yeah, sure. So, let’s see, Kentik is a network observability company, and so we are best known by the NetFlow analytics products we’ve been building for eight years. And we grew up out of the service provider industry, but we now have—I think the majority of our customers are what we call enterprise. So, these are just people who are in telecoms or ISPs. That’s who we spent a lot of our time with.

So, there’s the NetFlow thing. We help companies understand how they’re exchanging traffic with their cloud deployments. So, that seems to be a pretty big topic for us these days. We see a lot of demand there. We do synthetics, which is basically performance monitoring, and BGP. So, BGP is kind of my area, and so we have BGP monitoring and analysis capabilities. Yeah, it’s a cool company. And as far as reaching out to me, I’m on Twitter X, I’m on LinkedIn. That’s usually the best places to reach me. Otherwise, maybe we can add a little link to—I write blog posts for the Kentik blog. That’s usually where I’m publishing stuff out to the world.

Ned: Awesome. Well, we’ll include links to all those kinds of things in the [show notes 00:36:31]. Doug Madory, thank you so much for being a guest with us on Chaos Lever. And hey, dear listener, thanks for tuning in. I guess you found it worthwhile enough if you made it all the way to the end, so congratulations to you, friend, you accomplished something today. Now, you can go sit on the couch, fire up some ROAs and implement RPKI. You have earned it.

You can find more about this show by visiting our LinkedIn page, just search ‘Chaos Lever,’ or go to our website, chaoslever.com where you’ll find show notes, blog posts, and general tomfoolery. We’ll be back next week to see what fresh hell is upon us. Ta-ta for now.

More episodes

Chapters

What is Chaos Lever Podcast?