nanog mailing list archives

Re: Recommended DNS server for a medium 20-30k users isp


From: Mark Andrews via NANOG <nanog () lists nanog org>
Date: Sun, 10 Aug 2025 07:01:07 +1000

Firewall have a long history of breaking DNS.  

They have been known to throw away UDP fragments.   This breaks responses that exceed path MTU.  There is this myth 
that IPv6 doesn’t have fragments so they can just be blocked so IPv6 is particularly bad in this respect. 

Drop ICMP PTB.  This breaks PMTU  discovery which partially affects IPv6 UDP responses getting though as the sender 
needs to fragment.  It also stops TCP responses where the MSS and PMTU don’t align.  MSS fix up wouldn’t be needed if 
ICMP PTB weren’t blocked and are consistently generated. 

Filter out every query type but a handful that are magically blessed.  The firewalls this are oftern years behind the 
current query mix and DNS servers don’t need this service anyway. DNS servers know how to return this record does not 
exist. Additionally if you have added the record to the zone you don’t need a firewall second guessing your desires. 

Block DNS over TCP.   DNS has ALWAYS used both UDP and TCP for normal queries.  There have been plenty of times where 
UDP responses have said retry over TCP because the answer is to big only for the TCP request that be blocked because of 
the myth that DNS is only TCP. 

Run out of state tracking.  Recursive servers make hundreds of queries per incoming query when their caches are empty. 
We’ve seen connection tracking tables overwhelmed often. 

Stupid firewalls that “know” that this bit is 0 or this type never appears in this section or there aren’t any EDNS 
options in requests or drop requests with unknown EDNS options. Nameservers have rules for dealing with the unknown and 
they are infinitely better than drop the request. 

I’m sure there are other stupidities I’ve seen firewalls do. Juniper were particularly bad until we complained enough 
to get the defaults changed. 
-- 
Mark Andrews

El 10 ago 2025, a las 3:45, Mel Beckman via NANOG <nanog () lists nanog org> escribió:

Saku,

Thanks for the well delineated examples. I agree with them. You clearly illustratewrong configurations that can cause 
unanticipated failure modes. Thus it’s best to follow established design patterns, rather than cooking your own 
recipe.

But how is this different than using a firewall to protect any other service? Firewalls can fail, and thus require 
resiliency considerations. But they also can do a lot to insulate underlying services from attacks — source IP 
flooding, for example, or the myriad of sequence attacks — the kinds of attacks that are difficult to protect against 
in the pure IP stack.

I submit that one major firewall advantage is consistency of implementation. People who are protecting their DNS by 
cleverly hardening them using packet filters and load balancing are doing so with error-prone manual methods. Human 
error, as HAL says, is always a problem. Firewall code, on the other hand, goes through certification processes and 
deep regression testing before being deployed. Firewall developers are dedicated to the protection mission, while 
people standing up DNS at many enterprises, including ISPs, are not DNS experts. DNS is just one of many services 
they must manage.

I appreciate your anecdotes, but as every good scientist knows, the plural of anecdote is not data. I need to see 
some data backing up these claims about the relative unreliability of firewalls.

-mel

On Aug 9, 2025, at 8:41 AM, Saku Ytti via NANOG <nanog () lists nanog org> wrote:

On Sat, 9 Aug 2025 at 15:42, Måns Nilsson via NANOG
<nanog () lists nanog org> wrote:

I suppose you bring the beer then, because it's going to take both to
endure the cringefest that is "cascading resource exhaustion in DNS /
firewall setup" -- it can pretty fast end up snowballing completely out
of hand. Don't ask me how I know without picking up the bar tab.

I can share lessons from personal mistakes.

a) FW is always additional fuse in front of service, failure modes are
union of FW and Service, so MTBF is lower and MTTR is higher
  - state establishment rate is reduced
  - state count is reduced
  - either FW has protocol intelligence and occasionally as
protocols evolve or more exotic use cases exist drops valid protocol
packets or protocol unintelligent and doesn't add anything to
stateless HW based filter on edge router
  - any service protected by FW is easier to DoS than same service without FW

b) Even if FW is ran (like in front of corporate LAN which doesn't
have to deal with denial of service issues and regulator or PCI or
equivalent may require FW) valid configurations in my mind are
  - if 2 == cluster, 1 == single and + == routing separation
  - 1, 1+1, 2+1 are valid configurations
  - 2 and 2+2 are invalid configurations
  - every time i've ran '2', eventually there has been case where
cluster is dead and MTTR is high as vendor needs to be engaged and
depending on hour the people at vendor who actually can troubleshoot
the issue are not at work (used to be US hours, now increasingly
experts are in India time)
  - So if you can only afford 2 devices, have two devices separated
by routing, you'll lose state during failure, but you have less
failures, even if you can afford 4 devices, don't buy two clusters,
since the problem that breaks cluster may affect both clusters


Generally FW is needed if what is behind FW has dubious and únknown
state (like user LAN). But if what is behind FW is well thought out
DNS or HTTP service FW adds no utility and a lot of liability.

--
++ytti
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/DZZOX5JWSLEGYIO45G5HUUYXT5RXHRD2/
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/CSKJGBIL2R7BVVW2JZTGVSQIZY2UCWNK/

_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/HDEPZMUQPEI73KJEVZO3T6ATJCSEY3D4/

Current thread: