default local DNS failover solution needed, nscd?

Chuck Anderson <cra@xxxxxxx> · Fri, 25 Apr 2014 18:51:26 -0400

I'm starting a new thread to clarify and emphasize the problem I'm
actually trying to solve.  Here is the problem restated as I posted it
to the dns-operations list:

-----
Is it really expected that the first DNS server listed in
/etc/resolv.conf should never go down?  Operationally speaking, who
can actually rely on listing multiple nameservers in /etc/resolv.conf
and using libc's failover mechanism in any kind of production server?
Because the failover behavior in libc is atrocious--each new or
existing process has to re-do the failover after timing out, and even
long-running processes have to call res_init() to re-read resolv.conf.
It seems that the only sensible way to run a datacenter (or a network
full of Linux workstations for that matter) is to either:

1. Make sure the first nameserver listed in resolv.conf never goes
   down by using Anycast DNS or some other failover mechanism like
   VRRP or CARP on the DNS server side.

or:

2. Use a local DNS daemon on every server with forwarders configured
   to the network's nameservers, and fix resolv.conf to 127.0.0.1.
-----

(I've since learned that nscd can be a third option)

On Fri, Apr 25, 2014 at 07:19:17PM +0200, Petr Spacek wrote:
> On 25.4.2014 18:19, Simo Sorce wrote:
> >On Fri, 2014-04-25 at 09:56 -0600, Pete Zaitcev wrote:
> >>On Thu, 10 Apr 2014 10:41:54 -0400
> >>Chuck Anderson <cra@xxxxxxx> wrote:
> >>
> >>>[...]  We need an independent,
> >>>system-wide DNS cache, and always point resolv.conf to 127.0.0.1 to
> >>>solve this fundamental design problem with how name resolution works
> >>>on a Linux system.  Windows has had a default system-wide DNS cache
> >>>for over a decade.  It is about time that Linux catches up.
> >>
> >>I observe you pointedly ignore the existence of nscd (which does not
> >>require any changes to resolv.conf). Why is that?

Ignorance about nscd on my part.  Please tell me more.  What are the
honest pros/cons to using nscd?  Are there still big enough problems
with nscd to warrant its poor reputation?

> >nscd is ... bad

I've since learned more about nscd.  Apparently its reputation may be
undeserved, at least the newer versions in glibc.  I have no direct
experience, but I finally found a good thread about fixing the stub
resolver that addresses people's unwillingness to use nscd as well as
some other things that could be done, such as a patch apparently
carried by Debian and Ubuntu that improves detection of changes to
resolv.conf:

https://sourceware.org/ml/libc-alpha/2012-12/msg00416.html

> Main goal is to have local DNSSEC-validating resolver.

I, as the OP, did not intend that as the goal, although I have no
problem with that as a different goal.  My intent was to fix the
atrocious failover behavior of the glibc resolver.  I also don't mind
using a caching resolver BUT there should be a better stub resolver
that can be widely deployed in a default configuration that doesn't
require a local caching resolver to paper over its deficiencies.
Maybe nscd (and some of the other ideas in the link I posted) are part
of the solution.

Basically, we aren't going to win the war by suggesting that everyone
should run a DNSSEC-validating resolver everywhere.  But maybe we can
get widespread consensus for having a lightweight daemon that just
does failover correctly and nothing else fancy so that people won't
mind it running by default on Server, Workstation, Cloud, etc.  Maybe
nscd can be that daemon, or maybe something else needs to be written.

Whatever the solution to DNS failover, we should be sure it works
correctly in combination with/doesn't get in the way of full
caching/DNSSEC-validating resolvers, both local and remote, whether
they are installed/enabled by default in various Fedora products or
not.
-- 
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct