nanog mailing list archives

Re: Recommended DNS server for a medium 20-30k users isp


From: John Todd via NANOG <nanog () lists nanog org>
Date: Thu, 07 Aug 2025 21:41:36 -0700



On 7 Aug 2025, at 20:53, brent saner via NANOG wrote:

On Thu, Aug 7, 2025, 20:45 DurgaPrasad - DatasoftComnet via NANOG <
nanog () lists nanog org> wrote:

Hello all,
Do you have any recommendations for recursive DNS servers for a medium
sized (20-30k users) ISP.
We have used powerdns and unbound but sometimes find the caching times a
bit on upper side. Any suggestions between these two or anything new?
Also need points on how much we tune the settings
pros and cons if any.

Thank you /DP

<https://lists.nanog.org/archives/list/nanog () lists nanog org/message/SUTKDISSISPWQY3YGF25FBQNN2JD5HDP/>


It's surprising that you didn't get the performance you hoped for out of
PowerDNS. You already tried the suggestions in their tuning guide[0], I'm
assuming?

You may also want to load in entire zones to the hot cache[1].

And there's always horizontal scaling; sometimes you just plain hit limits
on vertical scale.

I haven't tried it yet, but dnsdist[2] should let you do this.
(Or keepalived and/or HAproxy, or... etc. Any loadbalancer that can handle
raw TCP and UDP.)
Dnsdist in particular seems explicitly targeted towards a large set of
untrusted clients with additional optional "safeguarding/consumer
protection" features. Quad9 uses it in some fashion, if I recall correctly.

[0] https://doc.powerdns.com/recursor/performance.html
[1] https://docs.powerdns.com/recursor/lua-config/ztc.html
[2] https://www.dnsdist.org/index.html


You beat me to it - dnsdist is an exceptionally robust solution for front-ending recursive (or authoritative) servers. 
Quad9 is indeed using it for all our recursive systems, and we split traffic on the "back-end" between PowerDNS 
recursor and Unbound.  It (dnsdist) has a "packet cache" feature which handles much of the load once warmed, and it 
answers on DOT/DOH as well as providing for a very rich set of tooling that allows management of unwanted behaviors.  
The combination of dnsdist plus a good recursive resolver should easily be able to handle 30k users on a single modest 
chassis with ease, though of course it there are very good reasons to have several systems similarly configured in 
fail-over models using ECMP or your favorite routing protocol.  Hot caches work better - try not to spread load too 
much.)  At this point, I can't imagine running a recursive system that is open to anything other than a tiny number of 
users without ensuring that dnsdist is in front of it - it's exa
 ctly the right thing and has been sandblasted by a lot of trial-and-error to make it fast and reliable with lots of 
features for ISP environments.

If a decent-sized system doesn't seem fast, there may be some other underlying issue that is at the root of a perceived 
speed issue. There is useful data that can be pulled out of dnsdist with prometheus-style outputs - I would suggest 
instrumenting things and seeing where the problems are.

Now, the original question of "points on how much we tune the settings" - that is a much longer discussion, but 
honestly you can get to 80% goodput without too much fiddling.

JT
_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/J4WSKWYCIV7KTCVWXDWT64IGHKQZHERB/


Current thread: