* Petr Menšík: >> Fedora made the decision to promote systemd-resolved as a local DNS >> cache. To me, that means that we can gradually remove other DNS caches >> from the distribution. > I maintain also dnsmasq and I doubt there is reason to remove it from > the distribution. I would oppose if anyone intented to do it. dnsmasq has other uses in the distribution, see e.g. libvirt. We don't quite see that for nscd anymore. >> There seem to be a lot of misconceptions about what nscd does and which >> benefits it brings (see the claim about increased privacy). So I think >> it's important to be precise here. > I expect it would only cache simple name:ip pairs, nothing more. No, I > doubt nscd can bring any additional privacy. Ahem. Name/sets of addresses, which covers negative caching as well. >> >> From my point of view, nscd is not a very satisfactory solution for DNS >> caching because it can't do DNSSEC, it can't do recursion, it can't do >> prefetching, it doesn't have a good way to detect dead servers, it can't >> inject local stub zones, and so on. > We can argue whether people need DNSSEC. Systemd-resolved cannot work > with it correctly and it actively BREAKS its usage. Just like dnsmasq, > nscd just caches and no more. It usually does not break anything. I > think it preserves most features of libnss_dns.so behaviour. No > recursion, dead servers detection or injecting local zones is required. > It is not done without the cache anyway. Dead servers detection could be > improved I think. Dead server detection is what people expect that have migrated from other systems. dnsmasq has a rudimentary form of it: /* In strict_order mode, always try servers in the order specified in resolv.conf, if a domain is given always try all the available servers, otherwise, use the one last known to work. */ I think that probably covers 80% of the use cases. (This really needs a separate daemon that centralizes the state, so that processes that just do one DNS query and exit can pick the right upstream server for the query.) >> I also think that not changing /etc/resolv.conf isn't a feasible goal >> because that's the file applications use to locate the system DNS >> resolver if they can't use the glibc interfaces for some reason. > Sure. If they can't use glibc interface, they would not use nscd. That > clearly its advantage! Typically, they still want a reliable DNS service, and in some deployment scenarios, that basically needs a local caching stub that does some rudimentary form of dead-server detection. > It does not have to implement dnssec or edns0, because getaddrinfo api > does not include such flags. If clients needs advanced usage, unlike > systemd-resolved, it does not stand in the way. I think this is the > greatest advantage. Advanced uses can work around only a simple > service without problem. They don't collide. They still suffer from the lack of local cache. I think the local cache is highly beneficially and important to Fedora, so I'm excited that we finally got one by default in Fedora 33. >> The big one is the general cache instability: >> >> nscd: Concurrency issues with cache. >> <https://sourceware.org/bugzilla/show_bug.cgi?id=25888> >> >> (Internal bug #1172792.) > This bug reminds me bug #1740511 [1], which was very hard for me. Later, > mlichvar discovered real reason for it. Atomic operations required > different flags to atomic operations. ppc64le platform has different > memory ordering than x86_64, where it worked flawlessly. It crashed > often just on ppc64le. Our fix was to switch to memory_order_acquire, > where integrity was enforced properly. I have seen relaxed in bug > attached patches, would recommend checking it out. > > https://bugzilla.redhat.com/show_bug.cgi?id=1740511 > https://en.cppreference.com/w/c/atomic/memory_order We had someone look at this who literally has a PhD in this area (software transactional memory). And yet here were are. 8-( >> Related to DNS data, there are bunch of issues that need investigating >> or fixing: >> >> getaddrinfo drops ipv6 V4MAPPED addresses from ncsd results >> <https://sourceware.org/bugzilla/show_bug.cgi?id=26630> >> >> Problems with nscd and systemd-resolved interactions on IPv6 network. >> <https://sourceware.org/bugzilla/show_bug.cgi?id=23546> >> >> nscd doesn't cache record containing more than one IP address. >> <https://sourceware.org/bugzilla/show_bug.cgi?id=15862> >> >> Reload nscd cache entry even if its timeout is equal to the current time >> <https://sourceware.org/bugzilla/show_bug.cgi?id=13931> >> >> hosts caching does not respect TTL, and caches old IP's >> <https://sourceware.org/bugzilla/show_bug.cgi?id=4428> > Is there any design document about nscd? Not really, it's necessary to reconstruct this from the implementation. > How are interfaces to cache implemented from glibc? Is connection to > nscd hardocoded in gethostbyname and getaddrinfo functions? It's code under USE_NSCD conditionals. The NSS functions in glibc are mostly generated through template files put through the C preprocessor (nss/getXXbyYY_r.c). It's certainly non-trivial to fix all this. And Fedora still defaults to systemd-resolved even if we continue to ship a fixed nscd. As I said before, the hosts cache is the weakest part of nscd, and where many alternatives exist, so if you approach nscd from this angle, I'm not sure if this is a good way to spend your resources. Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx