Re: Fedora 34 Change proposal: Remove and deprecate nscd in favour of sssd and systemd-resolved (Self-Contained Change)

Petr Menšík <pemensik@xxxxxxxxxx> · Sat, 21 Nov 2020 02:08:14 +0100

On 11/17/20 10:12 AM, Lennart Poettering wrote:
> On Mo, 16.11.20 21:48, Petr Menšík (pemensik@xxxxxxxxxx) wrote:
> 
>> But it does not have to learn everything about a server, because it
>> switched the active one. If it has to, try to find way to store server
>> instance features per server IP, not per link.
> 
> We do exactly this. But we also have a grace period after which we
> forget everything again, and go back to the best server feature level again,
> which then takes some time to settle back on the actual server feature
> level.
I think smart RTT checking every few minutes would be better, than
forgetting everything about server always responding quickly.
> 
> When we always use the same server, then we probe it once, and use it
> for the full grace period without any further delays. When we use
> numerous servers, and a different one for each lookup, then this means
> we have to probe each and every single one of them once and that slows
> down things.
You can cache average response time for the server. If answer did not
arrive in triple or usuall RTT, send to other servers too. It should not
be done for each request. Depends on how long grace period, right?
> 
> It's a very easy calulation: if you use n=1 servers for 500 lookups
> within the grace period, you experience 1 slow lookup in the worst
> case that required the feature probing, plus 499 speedy lookups
> because we already knew the earlier probing results. If otoh you use
> n=250 servers for 500 lookups you experience 250 worst case slow
> lookups, since we need to learn for each server individually what it
> can and can't do, plus 250 speedy lookups. And then, after the grace
> period is over, you get another 250 slow lookups...
I would think you can detect almost everything from 2 requests and
replies. Only special workarounds for broken implementations may take more.

It could work by timeout not too visible. It would send few queries to
secondary server, metering averate round-trip time. If it was
significantly better than the current server, switch to it. Unless
connection changed, you can expect previous quirks apply, but still you
can test response time. I think dnsmasq sends ocassinally few requests
to non-primary. If answers don't arrive withing hundreds of ms, try also
others. So user has answer soon and it did measure response time of all
used servers. Even if any of them is offline. It does not have to be 50%
of queries just to test it.
> 
>>> It might be something to add as opt-in, and come with the warning that
>>> you better list DNS servers that aren't crap if you want to use that,
>>> so that we never have to downgrade protocol level, and thus the
>>> learning phase is short.
>>
>> Sure enough, many router DNS implementations are bad or ugly. If it can
>> choose from full featured validated ISP resolver and crappy router
>> implementation, prefer the one with better features. Most likely it is
>> much better maintained as well.
> 
> I am sorry, but that is not a suitable approach for an "edge" DNS
> clients. We need to go through the DNS server info we acquired through
> DHCP or so, since private domains do exist. We must keep router admin
> pages accessible.
I haven't mentioned using any external DHCP client. Problem is, there is
no standard way to configure slit DNS over DHCP. Or even declare private
domains used by local networks via autoconfiguration.

I tried Mikrotik's 'router' name on rawhide container.
# dig @127.0.0.53 router
gives status: NXDOMAIN
# dig @127.0.0.53 router.
gives status: NXDOMAIN

/etc/resolv.conf points to /run/systemd/resolve/resolv.conf
but getent can deliver results. How is that possible?

# getent hosts router
192.168.88.1    router

# grep hosts /etc/nsswitch.conf
hosts:      resolve [!UNAVAIL=return] myhostname files mdns4_minimal
[NOTFOUND=return] dns

Of course, Mikrotik supports only DNS, not LLMNR or mDNS.

Are you sure you don't support your router's bogus domain, but forgot
about other vendors?

What I am saying is, resolved might choose only the second server from
DHCP list. Where only first knew about private domains for example. It
is kind of broken configuration, but not uncommon. Router's DNS might be
out of date, vulnerable to various attacks, but would be first IP on DNS
servers from DHCP. Second would be ISP's resolver with up-to-date
security backups. Now, systemd-resolved understands DNS servers as a
set, but not ordered list, right? Would and should it choose ISP's
server or my home router? What are checked metrics?

> 
> Hence, the "server spread" thing where queries are spread over a ton
> of DNS servers only really works if configured for the manual opt-in
> case. It's not something we could ever deploy by default. By default
> we need something that works, doesn't break private domains, and isn't
> slow.
I agree, but just partially. Original behaviour of nss_dns.so plugin was
to always keep order of used DNS servers constant. I am not sure this is
possible to prefer first server and use following one only in case first
is responding slow or never. I think default mode should be strict
ordered requests for best compatibility. Which is not default now, correct?

Cheers,
Petr
-- 
Petr Menšík
Software Engineer
Red Hat, http://www.redhat.com/
email: pemensik@xxxxxxxxxx
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
Attachment:
OpenPGP_0x4931CA5B6C9FC5CB_and_old_rev.asc

Description: application/pgp-keys
Attachment:
OpenPGP_signature

Description: OpenPGP digital signature
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx