Chaos Lever Podcast

Ned and Chris give a very brief overview of BGP, its place in the history of the internet, and how it works today.

It’s a Confusing Day in the Neighborship

Sure, Kim Kardashian broke the internet that one time, but she’s not the only one capable of such a feat. In this episode, Ned and Chris recount the tale of how Verizon and a BGP optimizer took large swaths of the internet offline in 2019. This leads them into the intricacies of border gateway protocols, tracing its evolution from a temporary solution for NSFNET in the 1980s to a foundational element of internet routing today. Along the way, they explore version four's operational details, including key attributes like local preferences and AS path length.

Links

BGP Deep Dive: https://www.youtube.com/watch?v=SVo6cDnQQm0
BGP defined: https://en.wikipedia.org/wiki/Border_Gateway_Protocol
ASN Allocation: https://www.nro.net/about/rirs/statistics/
Three napkins protocol: https://www.stuff.co.nz/technology/digital-living/69048160/the-three-napkins-protocol-quick-fix-for-early-internet-problem-left-web-open-to-attack
Or was it the TWO Napkins protocol?!?: https://computerhistory.org/blog/the-two-napkin-protocol/?key=the-two-napkin-protocol
Allegheny and DQE mess up the internet: https://blog.cloudflare.com/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/
NSFNET: https://en.wikipedia.org/wiki/National_Science_Foundation_Network
NSFNET Backbone: https://new.nsf.gov/impacts/internet
Leading Tier 1 ISPs: https://macronetservices.com/who-are-the-leading-global-tier-1-isps/

What is Chaos Lever Podcast?

Chaos Lever examines emerging trends and new technology for the enterprise and beyond. Hosts Ned Bellavance and Chris Hayner examine the tech landscape through a skeptical lens based on over 40 combined years in the industry. Are we all doomed? Yes. Will the apocalypse be streamed on TikTok? Probably. Does Joni still love Chachi? Decidedly not.

Ned: I made the unfortunate decision to just use chaoslever.com and no subdomain [laugh]. So, there’s two problems.

Chris: One is Ned.

Ned: One is me [laugh]. I am always the perennial problem. They go with the assumption you want to use ‘www’ as your subdomain, so they do support setting your apex record—the at record—for chaoslever.com to—

Chris: [loud snores].

Ned: [laugh] you’re very—you’re cruel.

Chris: [more loud snores].

Ned: [laugh]. Goddammit. Hello, alleged human, and welcome to the Chaos Lever podcast. My name is Ned, and I’m definitely not a robot. I am a sentient, real human person with feelings, dreams, and just the general desire to smoothly migrate a website and not have everything go to shit. [sigh]. With me is Chris, who was also here? Mostly.

Chris: Have you ever read my favorite philosophical tract?

Ned: I don’t know.

Chris: It’s a short one. It’s ancient text. It was translated, I think, from the Sumerian.

Ned: Okay.

Chris: And the title is, “Whatever You’re Trying To Do,” Sumerian question mark, dot dot dot, “Yeah, Good Luck With That.”

Ned: [laugh]. Wow, that is a philosophy that is just broadly applicable to every situation.

Chris: I believe—and this is, you know, it’s really tough with archaeology because you get a lot of incomplete records—

Ned: It’s true.

Chris: But I believe, and modern science agrees with me on this, the follow-up book to that is, “I Fucking Told You It Wasn’t Going To Work.”

Ned: [laugh]. I’m glad to know that the Sumerians were so blunt in their philosophy. There’s nothing aesthetic about it. I appreciate it.

Chris: I mean, it’s really, really hot in [Sumaria 00:01:57].

Ned: Is it?

Chris: Sure.

Ned: Whenever people would bring up ancient civilizations, Babylon, Sumaria, et cetera, I always thought of those as in, sort of, some mythical place that didn’t actually exist on the modern map of today, and I’m sad to realize at some point that was not true, and that these are actual locations that you can go to; they just have different names now.

Chris: Yeah, Ur still exists. I think it’s in Iraq.

Ned: I don’t like it [laugh]. Yeah… oh, well. Here we are. Let’s talk about another mythical thing that shouldn’t exist, but does. It’s BGP.

Chris: I’m not going to lie, that is, like, top ten transitions for you.

Ned: [laugh]. Thank you.

Chris: Might even be top five.

Ned: [laugh]. I felt really good about it, in part because it was completely organic and not planned. And now I’m ruining it by talking about it. So, another top five right there.

Chris: Different five.

Ned: Yes. So Chris—

Chris: What?

Ned: What’s your general feeling on BGP?

Chris: Anytime people start talking about it enthusiastically, I break a glass and walk away.

Ned: [laugh]. You don’t threaten them with it?

Chris: No, no, no, I just want the distraction. I understand and respect this conversation, but I don’t need it to be in my life at all.

Ned: It does seem like one of those mysteries of the faith when it comes to network engineering. Like, BGP, it’s overseen by wizards—

Chris: Oh, yeah.

Ned: And warlocks.

Chris: There are robes involved, incantations.

Ned: At least one animal sacrifice.

Chris: But not, like, a cute animal. They’re not monsters.

Ned: No. I’m trying to think of a non-cute animal, but they’re also adorable.

Chris: Only when they’re made into a Squishable.

Ned: Oh, that’s true. So, many Squish models. My house is infested with them. It’s a real Tribbles kind of situation. What were we talking about?

Chris: Uh, peanut butter?

Ned: Yes.

Chris: No, not again. Not again.

Ned: No, no, no, no, we’re not going down that again. Okay, so I want to start today’s episode with a story from 2019, a story that involves messing up the internet for, kind of, everyone. A story that begins with a small company in rural Pennsylvania. The main culprit: BGP, aka, Border Gateway Protocol. Chris, you may remember this, but for those who aren’t familiar, the small company involved is called Allegheny Technologies Incorporated.

And like any good technology company, when they needed to set up internet service, they didn’t just contract with one ISP, but instead they got connectivity from two, one from Verizon and one from a provider called DQE. That’s smart, you know? If DQE goes down, they can still get out through Verizon and people can reach them, et cetera, et cetera. You get the idea. Unfortunately, through a series of configuration errors and incompetence or laziness on the part of Verizon—

Chris: [gasp]

Ned: —shocking, I know [laugh]—deep breaths—large swaths of clients on the internet suddenly had their traffic routed through DQE to Allegheny Inc. And then back out through Verizon. An article on Cloudflare’s website compared it to routing all of the traffic for a major highway through a small suburban development. I think that’s actually an understatement. This would be like taking all the traffic from all the major highways in the United States and putting them through one small street in, like, gridlock Philadelphia.

Chris: Or, like an unpaved one lane road.

Ned: In Old City [laugh] yes. DQE and Allegheny obviously did not have the capacity to handle such a ridiculous increase in traffic, so they started dropping packets like crazy, and I’d imagine that one or more routers in the path just completely melted down. Eventually Cloudflare was able to reach engineers at DQE and get the situation resolved, but even with the fix in place, it took a few hours for the global internet to converge on the updated and now corrected routing. The Cloudflare article also details three different ways that this particular incident could have been avoided, specifically, prefix limits, IRR filtering, and RPKI don’t worry about what those things are just yet. We will get to them later, and by later I mean, next episode.

Chris: [laugh].

Ned: Probably. We’re going to use this little tale that I’ve told as a touchstone for this and however many more episodes it takes me to cover BGP.

Chris: My guess is ten.

Ned: Ahhh. I mean, at least. Minimum. I also plan on bringing on a real BGP expert in a later episode who can help us understand how to operate BGP securely because—spoiler—it’s horribly insecure right now.

Chris: Ahhh.

Ned: Yeah, shocking, I know. But first, what the hell is BGP, and how can it wreck a whole person’s day?

Chris: Or even half a person.

Ned: BGP history. I recommend drinking during this portion [laugh]. Okay, so, as I said earlier, BGP, it stands for Border Gateway Protocol. There’s a border, and it involves gateway, and this is a protocol. It is exactly what it says on the tin.

Chris: You needed 3500 words? You could have just said that? I thought this was going to be, like, a full episode.

Ned: Oh, no, that’s it. We’re done.

Chris: Yeah.

Ned: Everybody can go home. I explained it all. Okay, everybody that’s still here, let’s get into it. So, it is the exterior gateway protocol that the internet uses to figure out how to get packets from a source to a destination, and then back again. To understand why BGP exists and how it functions, we’re going to have to go back in time. Grab your best leg warmers, your heather gray sweatshirt, and red bandana because it’s time to get totally ’80s. No comment on that?

Chris: No I’m just a little offended that you used my current outfit as some kind of joke.

Ned: It was an inspiration, if you will. As we covered in a previous episode about DNS, the modern internet grew out of ARPANET, and its replacement NSFNET.

Chris: Which is totally different than NsfwNET, which we’ll talk about on a later episode.

Ned: [laugh]. That’s behind the Patreon paywall.

Chris: [laugh].

Ned: Ned and Chris after dark. If you want that, let us know. I think it’d be awful, but you know, you’re willing to pay for it [laugh].

Chris: Previous evidence has shown that no one will ever want that.

Ned: Okay, good [laugh]. NSFNET was established by the National Science Foundation, and its original intention was to connect five supercomputers in the US and various campus networks, tie them all together using a backbone network that NSF would help fund and manage. The backbone network was run by a single entity, and used leased lines from telcos that were running at a blazing 56 kilobits per second.

Chris: Oof, Mario Andretti.

Ned: Scorching. If you had a 56k modem in the early-’90s, you had the same network bandwidth as NSFNET at its inception in 1986. You probably didn’t have a supercomputer, but I mean, you had the effective bandwidth. NSFNET wasn’t open to just anyone. You couldn’t dial up and, you know, put it on the little cradle thing for your modem; they had a process by which regional networks could join.

And those regional networks in turn had to adhere to the acceptable use policy of NSFNET, which precluded using NSFNET for making money. This was supposed to be campuses, and universities, and educational institutions all coming together to do research and trade information. So, this wasn’t about making money. That comes later. The whole thing was overseen by Merit Network, which was a networking consortium out of Michigan, and they ran a network operation center, and they worked to design and implement the network connectivity that was used by the backbone.

Since the NSFNET formed the backbone of all of these different networks and their interconnectivity, there was a hierarchy, and all inter-network traffic had to traverse this backbone. So, if Regional Network A wanted to talk to Regional Network B, it would go up [background noise] to the backbone—what was that?

Chris: I didn’t drop my fidget toy. I don’t have a fidget toy.

Ned: —[laugh]—it would send the traffic up to the backbone, and then the backbone would take it to Regional Network B, and send the traffic back down. So, it was a relatively simple network when it comes to the interconnectivity between all these regional networks and the supercomputer. The NSFNET knew all the connected networks and could pretty easily route traffic from one network to another, but it also came with the lack of resiliency and serious bandwidth constraints. You only had one connection to the other regional network, and if the backbone went down or was congested, you were kind of out of luck. NSFNET had to pretty quickly update their backbone from these 56 kilobit per second lines to T1 lines that ran at 1.5 megabits per second. That happened in 1988. And then they had to upgrade them again in 1991 to 45 megabits per second, which was known as a T3 line.

While it was possible to keep increasing the speed of the leased lines that formed the NSFNET backbone, additional lines were added, which introduced multiple paths for traffic to travel. At the same time, NSFNET was connecting with networks in other countries and to even more networks in the US, so the idea of handcrafting traffic routing tables to efficiently move traffic was no longer viable. Back in the early-’80s, the networking group at the IETF was aware of the looming issues behind the inter-network routing, and so they proposed what they called the Exterior Gateway Protocol in RFC 827. And that was in 1982, and then it was updated further in 1984. And EGPwas actually used by NSFNET, but it had some serious shortcomings, so in 1989, RFC 1105 proposed the Border Gateway Protocol to replace EGP. To make it even more confusing, all routing protocols that are inter-network routing protocols are called ‘exterior gateway protocols.’ That’s not going to be confusing at all.

Chris: Definitely not.

Ned: The important thing to understand is that EGP as its own standard has since been retired. So, you can refer to EGP as broadly any protocol that handles this inter-network traffic. BGP itself is sometimes referred to as the three-napkin protocol, as the original ideas that underpin it were scribbled out by two engineers in Austin across three ketchup napkins. There’s no ketchup on the actual napkins; they were just, I guess, at a fast food place that served fries, and you were supposed to put ketchup on the napkins. I don’t know. Weird terminology.

Chris: Maybe the napkins were sponsored by big ketchup.

Ned: Ohhh. Heinz. Got to watch out. They get their paws into everything. They’re red, yucky paws. That’s an awful visual, I’m sorry. So, while this story might seem apocryphal, they have actual pictures of the napkins. There’s no ketchup stains, but it does have the actual diagrams and sort of the flow for distributing routes in a BGP system.

Chris: All right, I’m going to ignore you for a minute and actually look this up because I’m curious.

Ned: [laugh]. Fair enough. BGP was not meant to be a long-term fix for the problems that NSFNET was experiencing, and that the larger internet would experience. It was just meant to be a relatively short-term fix to deal with the explosion of networks that were now forming the internet. The engineers really thought that they would come along later and replace it at some future point with a more robust and well-thought-out protocol. And that’s adorable.

Chris: Still searching. I’m sure what you’re saying is interesting.

Ned: Mm-hm. It’s a well-known fact that anything that you put into production, even if it’s supposed to be a temporary fix, will become a [laugh] a pillar of everything else that’s built later, and it’s going to be very hard to remove that pillar. BGP is no exception. They mapped it out in 1989, and we’re still waiting for its replacement. This is going to become important as we start to talk about BGP and its security controls, or its complete lack thereof.

They didn’t think they needed them because this was supposed to be a stopgap measure. BGP was iterated on quickly, with version two coming in 1990. So, that’s a year later from the original idea. Version three came in 1991, and version four came in 1994. Version four is the current version of BGP in use by the internet today, so let’s talk about how it works. Unless you have some interesting information about these ketchup napkins.

Chris: Are you sure it wasn’t called the two-napkin protocol?

Ned: Nope. Three napkins. It had a picture of three napkins. It’s not the first thing to be drawn out on napkins, though. Because engineers—

Chris: We could do a whole episode on things that were drawn out on napkins.

Ned: [laugh]. Oh, and how they’re all universally terrible. [sigh].

Chris: Anyway.

Ned: So—

Chris: Back to whatever it is we—

Ned: BGP.

Chris: Which was—oh right, BGP. That’s what you were saying. Okay.

Ned: We’re going to—not napkins—

Chris: I’m back.

Ned: —but we can talk about napkins still. I have strong opinions. How expansive do we need to get here about BGP? I’m going to assume that most people listening know at least a bit about networking. At least, I hope so. Like, otherwise, why are you tuning into this podcast [laugh]? Be super weird. Except for you. Hi, mom.

Chris: Oh, don’t act like your mother listens.

Ned: It’s cruel and true. So, I’m going to take it as a given that most people know what an IP address is, are vaguely aware of TCP and how it works, and have at least heard of routing protocols, even if you don’t understand any of them, even RIP. Maybe the best thing here would be a packet walk. How does a packet on my desktop make its way to pod.chaoslever.com. Just pulling an address out of the air.

Chris: Totally random.

Ned: Totally random. First, my desktop has to figure out the IP address to send the web request to, and that’s a function of DNS. And Chris, as you know, we did two whole last shows about DNS. Go look them up. Enjoy them. Pod.chaoslever.com is hosted on Podpage, which has a few different public IP addresses on the 216.239.32.0/19 network. Make sure you remember that. There will be a test later.

Once I have an IP address, how does my desktop know where to send that web request? How does it actually route the packet there? Well, my desktop’s networking stack has a route table in it. If you’re on a Windows box like me, open up a terminal and run the command ‘route print-4’. That will give you all the routes stored locally for IPv4. On Linux, it’s probably something like ‘ip route list.’ On Mac, I have no idea. I think it’s also ‘ip route list’ or something similar?

Chris: Correct.

Ned: This list determines where a packet is sent, with the most specific entry winning. Now, since the website I’m trying to contact has a public IP address, my desktop is going to use what’s called the default route, which looks like 0.0.0.0, which in my case, points to the home router as the next hop, which is 192.168.1.1. I’m very creative. Yes, you’re welcome.

Chances are that is the [laugh] gateway of your home router as well. Once my packet hits that router, it checks the route table there—or the router checks its route table—and decides where to send the traffic next. My router has a single WAN interface, and that when interface has a public IP address that was handed out by my ISP. There is a default route on my router that sends traffic to the next hop that my ISP lists, which is going to be some kind of router on their side that has its own routing table. My ISP is Verizon, and my packet may bounce around inside of the Verizon network for a while before emerging at one of their peering endpoints. And we’ll cover peering in a little bit.

So, we’ve gone from my desktop to my home router to one of Verizon’s routers, and then it bounces around inside of their network until it emerges to go get to Podpage. That network—Verizon’s network that’s all the various routers that they control—is what’s referred to as an autonomous system, or AS. That network is privately managed by Verizon, and all traffic inside their network is routed using whatever Interior Gateway Protocol they want to use. That’s an IGP. Wooo.

That could be ISIS, OSPF, or even an internal version of BGP called iBGP. We’re not going to get into that; just know it exists. That internal routing protocol is going to decide where my packet emerges from the Verizon network. The path that my packet takes once it hits the border between Verizon and other autonomous systems will depend on external BGP and how it makes decisions. Each autonomous system on the internet gets an AS number or ASN. The original ASN specification used 16 bits, so the maximum AS number was 65,355, because we count from zero.

And just like IPv4, there is a range of ASNs that are reserved for private or internal use. So, if you were setting up iBGP, you would use those internal ASNs. The rest of them are managed by the internet Assigned Numbers Authority or IANA, which maybe has an acronym pronunciation, I’m not sure. Have you ever heard one?

Chris: Uh, Jana?

Ned: Ayana? Eh. It’s IANA.

Chris: I think that was a Fleetwood Mac song.

Ned: Nice. [sigh]. Wonder where they got that name, the internet Assigned Numbers Authority. They assign numbers. Blocks of ASNs are handed out from the IANA to regional internet registries, and those handle the actual assignment of ASNs to people who want ASNs, these regional networks.

When BGP was first implemented 16 bits probably seemed like plenty, and also was what routers were capable of handling at the time. In 2012, RFC 6793 expanded ASN to use four octets, or 32 bits, which raised the number of available numbers to roughly 4 billion. Will that be enough? At the moment, current statistics show that regional internet registries have handed out 130,000 ASN, so, um… I think we’ll be all right, for a while.

Chris: We’ll be good, I think. We’ll be good.

Ned: This is very different than the lack of available public IPv4 addresses because it’s not like every device gets an ASN. It’s every large network gets one. Still, though, that’s 130,000 public-facing as NS that BGP has to worry about when it comes to routing your packets. This thing has to be scalable. So, how does it do that?

Chris: I thought we already established that: magic.

Ned: Yes. That’s essentially what it is. And if you want to stop there, and just know that that’s what BGP is responsible for, you can ignore the next, like, ten minutes [laugh]. To get into some of the detail—and we’re not going to get down to nitty gritty here, but just some of the detail here—BGP is what’s called a path vector-based routing protocol, which means that it decides on a specific path for a route-based on attributes. Vector is the direction and path is the selection.

BGP doesn’t understand or care about things like bandwidth, or latency, or even hops, really. Instead, it has a path selection algorithm that walks through the attributes of each possible path for a packet, and then picks one based on the selection criteria. We’ll get into the actual process it uses in a moment, but where is it getting this information from? From its neighbors. Oh, they have neighbors. It’s like a community. And there’s also communities [laugh].

Chris: I would just like to pause and remind everybody that Ned explicitly said he wasn’t going to get into the nitty-gritty.

Ned: I’m not [laugh].

Chris: That’s the thing.

Ned: This is the high-level stuff [laugh]. It gets so much deeper.

Chris: No, no, I just wanted to point that out to explain to people a little more justification as to why my run away screaming protocol is what I operate upon when BGP comes up in quiet conversation.

Ned: Right. All right, so if I’m a BGP—I’m a router running BGP, you can call me a node—I form relationships with other routers running BGP through what’s called neighborships. I don’t like the term, but apparently it’s used.

Chris: Please tell me that’s not real.

Ned: That’s real. I’m sorry. Setting up a neighborship is very, very simple. Let’s say we’ve got two routers: Router A and Router B. On Router—

Chris: I just got—oh, my God.

Ned: What?

Chris: Neighborship?

Ned: Neighborship. I heard it first, and that was like that can’t possibly be the real term. They’re also called peers, and I like that better, but that gets into the difference between peering and transit. And so…

Chris: Can you hold on for one second, I got to go get a glass.

Ned: [laugh]. Smash it real hard. [sigh]. The problem is that we use the same terms to mean too many different things in technology, and so sometimes we just got to make up a word, and it’s not always good. Anyway.

So, let’s say I have two routers: Router A, Router B. On Router A, I tell it the IP address of Router B and its ASN. And then over on Router B, I tell it the IP address of Router A and its ASN. On Router A, I add any networks that I want to advertise, and same thing for Router B, and that’s it. The two routers will establish a TCP connection over port 179, and start exchanging route information.

Each router will share the networks that it is advertising and any networks it learned about from other routers. And BGP only sends messages across that link when there’s an update to its advertised routes. So, unlike something like RIP that, every 30 seconds goes, “Here’s all my routes.” “Here’s all my routes.” That would be bad and awful, so BGP just sends information when something changes about one of the advertised routes. Otherwise, just hangs out, chills, plays Pinochle, and every 30 or 60 seconds, it sends a keep-alive saying, “Yep, I’m still here. I got nothing new to say.” Kind of like you, Chris. I check in every 30 to 60 seconds to make sure you’re still here [laugh].

Chris: As usual, I’ve got nothing new to say.

Ned: [laugh]. Indeed, the routing decisions made by Router A will depend on the advertisements it gets from its neighbors. So, so far, we’ve just got Router A and B, but we can add additional routers as neighbors: C, D, and E. Router A learns about routes to different networks from all of these neighbors, and then makes path-based decisions based on the routes that it learned. BGP network advertisements can have a ton of attributes, but there’s really only about eight standard ones that are commonly used, and honestly, there’s probably only about three or four that actually matter, so we’re just going to talk about those.

Chris: Thank God.

Ned: Yes. Local preference is an attribute that lets you prefer one route over another. I could give Router B preference over Router C. Very straightforward. If both routers are an option for a given destination, the one with the higher preference gets the nod. So, Router B would get—I’d send my traffic to Router B instead of Router C.

That’s useful if, say, the link on Router B is a ten gig link and the link to Router C is one gig. I probably want to use the link to Router B if I can help it. BGP doesn’t know about link speed, but you do. The next attribute is AS path length. The AS path is a list of every autonomous system a packet will pass through, from source to destination.

So, when a router learns about a route from one of its neighbors and wants to share that route with the next router in line, it tacks on its AS number to the end of the AS path. So, the more autonomous systems a route travels through, the longer the path length becomes, and that makes it less preferred as a path to choose. That doesn’t mean that the shorter AS path route is actually faster, it just means that it’s shorter. Inside that autonomous system, there could be way more hops between the ingress and egress routers, so that’s why you might want to use something like local preference if you know that, say, Joe’s ISP and Crab Shack kind of sucks at passing traffic.

Chris: Phenomenal crabs, though.

Ned: Really good crabs. The last attribute is the router ID. If all other attributes for a route are the same, the lower router ID wins. Where does that router ID come from? That’s weird. It’s kind of up to the admin. The form looks exactly like an IPv4 address, and it’s usually set to the first loopback interface on the router.

The router ID needs to be unique within an individual autonomous system and unique among its peers. So, you know, you can’t have two routers in the same neighborship—so sorry—that have the same router ID. Bad things will happen. Speaking of peers—back to our packet walk—the request has now left the Verizon network and it’s gone to some other network based on advertised routes. The Verizon router made a decision based on the path attributes for each route.

Where is this all happening? Physically, where’s this actually happening? It’s at an internet exchange point of some kind—most likely—where a peering or transit arrangement has been created between two or more routers. So, at this point, we’re kind of done with BGP, but that led me to another rabbit hole, which is okay, I understand the theory. Where’s all this stuff actually happening?

And it’s happening at these dedicated colocation facilities and internet exchange points. They used to be called NAPs, which was like Network Access… something. And there was a place called a SUPERNAP, down in Virginia, I think, where there, like, a metric shit ton of these different ISP lines all coming into the same facility. I don’t know if it’s still called the SUPERNAP.

Chris: I think I’m lined up for a super nap, if you know what I’m saying.

Ned: I do. I set you up for that one. You’re welcome. So, this isn’t entirely relevant to BGP, except it filled in some mental gaps for me. How are two autonomous systems connected? Well, they’re connected by two routers, but there’s two basic physical topologies that are followed: you can have a public peering arrangements between a bunch of ASs, and that usually happens at one of these internet exchange points, or a rented colocation space from a neutral provider. Think Equinix, or Digital Realty would be examples.

Each ISPs router will be connected into a common switch fabric, and peering relationships will be formed between each router that’s connected into the switch. So, they’re all exchanging routing information with each other. The other option is a direct router-to-router connection between two ASs. That’s known as private peering. If you’ve ever been involved in setting up a connection to AWS with Direct Connect, or Azure with Express Connect—or Express Route. Sorry, stupid names—both of those use private peering and a direct physical connection from your network to Azure or AWS.

You have to set up what’s called a cross-connect, which is essentially, from your router—or a router that you’re leasing through your ISP—it’s a cable that runs to the router or the switch that the cloud router is hooked into. There’s also a slight difference between peering and transit. Peering means that I can send traffic to your network, and you can send traffic to my network, and we don’t charge each other any money for accepting that traffic. Consider a scenario where you have a few different regional networks that want to pass network traffic between each other, rather than sending the traffic across a transit network. They can all rent space together at a colocation data center, and set up a public peering arrangement where they’ll exchange routes and paths traffic. It’s beneficial for all the networks involved to be able to communicate freely, and there’s a verbal peering agreement, or handshake agreement, to not be an asshole about it.

Chris: [laugh].

Ned: I’m serious. They’re like, “Just don’t be a dick. Don’t overwhelm my network with traffic that’s destined for somewhere else. Don’t try to use me as a transit network, and we’ll all get along.” And yes, I’m very serious. A study in 2011 showed that only 0.05% of peering agreements were actual written contracts. I imagine that’s grown in the last 13 years with the explosion of cloud where, like, if you want a peering agreement with Azure, it is absolutely a written contract, but from what I’ve heard, that’s in the minority. These regional networks are still using just handshakes and, like, firm nods at each other.

Transit relationships are where a network is paying another network for access to the general internet. There’s a few giant tier one operators that lots of other networks pay to transmit their traffic across the internet. A regional network in, say, Luxembourg is unlikely to have a direct peering relationship with a network in Omaha, Nebraska, so that traffic needs to transit through another provider. That provider doesn’t see a mutual benefit for providing that transit, so they charge for it. Tier one networks are those networks that can reach all other networks on the internet using settlement-free peering.

Tier two networks have to pay for at least some transit to other networks. And tier three networks pay for transit to all networks. Who are these mysterious tier one providers? Well, Verizon is one. So, is AT&T, and Comcast, and Lumen, who you might not have heard of, but that’s because they used to be called CenturyLink. They changed their name because they had a terrible reputation, and that was going to help. They’re also the biggest tier one provider in the world as far as I can tell.

Since Verizon is a tier one network—going back to our packet walk, and to round this all out—Since it’s a tier one network, my packet doesn’t have to go across another transit network to get to Podpage. I looked it up, and Podpage is actually using Google Cloud to host their service. So, when I looked at it, the ASNs for Podpage—or the public IP addresses they’re using—lined up to Google’s ASNs, and so my little packet will go directly from Verizon network to Google. No other transit required. And in fact, that’s exactly what it does.

Through the magic of traceroute, I can see my packet hop from Verizon, to Verizon business, to Google, to another Google AS because they have multiples. BGP has done its job, and all as well with the internet. But what if it isn’t?

Chris: [laugh].

Ned: How can BGP break? And can people do it on purpose? The answers will shock you. I—they probably won’t shock you [laugh]. The answer is there are many ways to break BGP, and yes, it can be done on purpose. But that is the story for another time, a future episode, and a guest who’s more eloquent than me at explaining security issues with BGP. [sigh]. You feel better?

Chris: No.

Ned: Have I demystified some of the magic of the internet for you?

Chris: I’m more confused than when I started, and I didn’t think that was possible.

Ned: Good. Then my job… [laugh] is a complete success. My job here is done. Hey, thanks for listening or something. I guess you found it worthwhile enough if you made it all the way to the end, so congratulations to you, friend, you accomplished something today. Maybe. Now, you can sit on the couch, think about the magic of BGP, and just get hopelessly confused like the rest of us. You’ve earned it.

You can find more about this show by going to our LinkedIn page, just search ‘Chaos Lever,’ or go to the website, pod.chaoslever.com, where you’ll find show notes, blog posts, and general tomfoolery, and you can leave a comment that we might read on the Tech News of the Week. We’ll be back next week to see what fresh hell is upon us. Ta-ta for now.

Chris: And just to make things even more unnecessarily confusing, it was originally called the two-napkin protocol, when it was first proposed and first published in a Cisco internal blog in 1989.

Ned: [laugh]. And then there was a third napkin arose? Oh, no.

Chris: Look, I mean, math is hard.

More episodes

Chapters

What is Chaos Lever Podcast?