nanog mailing list archives
RE: CGNAT growing pains
From: "Howard, Lee via NANOG" <nanog () nanog org>
Date: Wed, 9 Oct 2024 16:17:40 +0000
First, roll out IPv6 if you haven't yet. That should relieve a lot of pressure on your pool size, and gives customers a
workaround for some of the weird things ("Use the IPv6 address instead of IPv4.").
Second, build your own geofeed. You can create a CSV providing as much detail as you want, down to "This individual
address is at this long/lat" if you want. Then publish the location of that file in whois.
Short pointer: https://mailman.nanog.org/pipermail/nanog/2022-April/219080.html
After you've rolled out IPv6 you can consider 464xlat or MAP-T. Both work well, but both require support from the CPE.
I've heard of a custom implementation that kicks a customer off the CGN/xlat/BR if it detects uPNP (i.e., a customer
that needs port forwarding). It requires reprovisioning the CPE and a reboot, but two minutes of downtime probably
prevents a support call.
Lee Howard
IPv4.Global
-----Original Message-----
From: NANOG <nanog-bounces+leehoward=hilcostreambank.com () nanog org> On Behalf Of Jon Lewis
Sent: Tuesday, October 8, 2024 3:19 PM
To: nanog () nanog org
Subject: CGNAT growing pains
[You don't often get email from jlewis () lewis org. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification ]
This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments.
We started rolling out CGNAT about 6 months ago. It was smooth sailing for the first few months, but we eventually did
run into a number of issues.
Our customer base is primarily FTTH with "dynamic" IP assignment via DHCP.
Since connections are always-on, customer ONTs/routers get an IP assigned, and then when the lease is renewed, they
request a new lease for the existing IP, and, in general, that request is granted. This gives customers the mistaken
impression they have a static IP. So, my impression, from working with some customers who've needed to be moved from
CGNAT back to public IP is that customers who are doing port-forwarding don't even bother with dynamic DNS. They just
know they can connect to their IP as they've never seen it change. We do offer/sell static IP, but pre-CGNAT, it was
strictly for business customers. i.e.
A residential customer could only get static IP service by converting their account to a business account. That may
change in the near future.
One issue we didn't foresee has been IP Geo issues. i.e. We all knew that streaming services like Netflix use IP Geo
to determine what content should be made available, but that's, AFAIK, limited by country or region.
What we didn't anticipate is services like Hulu Live TV doing IP Geo down to the city level to determine which local
channels are a subscriber's local channels. We're using Juniper MX gear and SPC3 cards for our CGNAT routers, each one
having a single large external pool. Since we serve most of FL, one external pool can't IP Geo correctly for customers
as far apart as Miami and Jacksonville hitting the same CGNAT router. We don't currently have an acceptable solution
to this other than moving impacted customers off CGNAT.
One of the great unknowns (at least for us) with CGNAT was what our PBA settings should be. i.e. How large each
port-block should be, and how many port-blocks to allow per customer. We started with 256x4. It seemed to work. We
eventually noticed that we were logging port-block exceeded errors. This is one aspect where Juniper's CGNAT support
is lacking.
There's a counter for these errors, and it's available via SNMP, but there's no way to attribute the errors to
subscriber IPs. We're polling the mib and graphing it, so we know it's a continuing issue and can see when it's
incrementing faster/slower, but Junos provides no means for determining if "PBEs" are all being caused by a single
customer, a handful of customers, etc. We have a JTAC case open on this. As a quick & hopeful fix, we both increased
the port-block size and block limit. That helped, but didn't stop the errors. It also cut our CGNAT ratio by more
than half (64:1 -> 28:1), if we stay at this ratio, we'll need much larger external pools than originally anticipated.
Tuning these settings is kind of painful as JTAC strongly recommends bouncing the CGNAT service anytime CGNAT related
config changes are made. This means briefly breaking Internet access for all CGNAT'd customers. For the PBEs, JTAC's
suggestions so far have been to shorten some of the timeouts in the config and to keep doing what we're doing, which is
a cron job that essentially does a "show services nat source port-block", parses the output looking for subscriber IPs
that have used up the ports in several of their port-blocks, then does a "show services sessions source-prefix ..." and
logs all of this. This at least gives us snapshots of "who's a heavy user right now" and lets us look at how they were
using all their ports. i.e.
was it bittorent, are they compromised and scanning the internet for more systems to compromise, is it legit looking
traffic - just lots of it, etc.?
The latest CGNAT issue is a customer with a Palo Alto Networks firewall connected to our network and several of their
employees are our FTTH customers. On their PANW firewall, they're doing IP Geo based filtering, limiting access to
internal servers to "US IPs". Since we only CGNAT traffic to the external Internet, their on-net employees hit the
firewall from their 100.64/10 IPs and get blocked. I suggested they whitelist 100.64/10, saying we block traffic from
100.64/10 from entering our network via peering and transit, so they can be assured anything from
100.64/10 came from inside our network / our customers. They say the firewall won't let them whitelist 100.64.0.0/10,
giving an error that it's invalid IP space.
I know we're not the first to implement CGNAT, so I'm curious if others have run into these sorts of issues, or others
we haven't run into yet, and if so, how you solved them.
----------------------------------------------------------------------
Jon Lewis, MCP :) | I route
Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public
key_________
Current thread:
- Re: CGNAT growing pains, (continued)
- Re: CGNAT growing pains Jon Lewis (Oct 08)
- Re: CGNAT growing pains Michael Thomas (Oct 08)
- Re: CGNAT growing pains David Bass (Oct 09)
- Re: CGNAT growing pains Lucien Hoydic via NANOG (Oct 09)
- RE: CGNAT growing pains Howard, Lee via NANOG (Oct 09)
- Re: CGNAT growing pains Aaron Gould (Oct 10)
- Re: CGNAT growing pains Andrew Peterson via NANOG (Oct 10)
- Re: CGNAT growing pains Curtis, Bruce via NANOG (Oct 13)
- Re: CGNAT growing pains Jon Lewis (Oct 08)
- Re: CGNAT growing pains Tom Mitchell (Oct 11)
