nanog mailing list archives

RE: Peering/Transit eBGP sessions -pet or cattle?


From: <adamv0025 () netconsultings com>
Date: Thu, 13 Feb 2020 09:49:00 -0000

Baldur Norddahl
Sent: Wednesday, February 12, 2020 7:57 PM

On Tue, Feb 11, 2020 at 12:33 AM Lukas Tribus <mailto:lists () ltri eu> wrote:
Therefore, if being down for several minutes is not ok, you should 
invest in dual links to your transits. And connect those to two 
different routers. If possible with a guarantee the transits use two 
routers at their end and that divergent fiber paths are used etc.

That is not my experience *at all*. I have always seen my prefixes 
converge in a couple of seconds upstream (vs 2 different Tier1's).

This is a bit old but probably still thus:

https://labs.ripe.net/Members/vastur/the-shape-of-a-bgp-update

Quote: "To conclude, we observe that BGP route updates tend to 
converge globally in just a few minutes. The propagation of newly 
announced prefixes happens almost instantaneously, reaching 50% 
visibility in just under 10 seconds, revealing a highly responsive 
global system. Prefix withdrawals take longer to converge and generate 
nearly 4 times more BGP traffic, with the visibility dropping below 10% only after approximately 2 minutes".

Unfortunately they did not test the case of withdrawal from one router 
while having the prefix still active at another.

Yes that's unfortunate,
Although I'm thinking that the convergence time would be highly dependent on the first-hop upstream providers involved 
in the "local-repair" for the affected AS -once that is done doesn't matter that the whole world still routes traffic 
to affected AS towards the original first-hop upstream AS, as long as it has a valid detour route.
And I guess the topology configuration of this first-hop outskirt from the affected AS involved in the "local-repair" 
would dictate the convergence time.
E.g. if your upstream A box happens to have a direct (usable) link/session to upstream B box -winner, however the 
higher the number of boxes involved in the "local-repair" detour that need to be told "A no more, now B is the way to 
go" the longer the convergence time.
-but if significant portion of the Internet gets withdraw in 2 min -wondering how long could it be for a typical 
"local-repair" string of bgp speakers to all get the memo.
-but realistically how many bgp speakers could that be, ranging from min 2 - to max... say ~6? 
   



When I saw *minutes* of brownouts in connectivity it was always 
because of ingress prefix convergence (or the lack thereof, due to 
slow FIB programing, then temporary internal routing loops, nasty 
things like that, but never external).

That is also a significant problem. In the case of a single transit 
connection per router, two routers and two providers, there will be a 
lot of internal convergence between your two routers in the case of a 
link failure. That is also avoided by having both routers having the same provider connections.
That way a router may still have to invalidate many routes but there 
will be no loops and the router has loop free alternatives loaded into 
memory already (to the other provider). Plus you can use the simple 
trick of having a default route as a fall back.

This is a very good point actually, indeed since the box has two transit sessions in case of a failure of only one of 
them it will still retain all the prefixes in FIB -it will just need to reprogram few next-hops to point towards the 
other eBGP/iBGP speakers, whoever offers a best path. And reprograming next-hops is significantly faster (with 
hierarchical FIBs anyways).

adam



Current thread: