On 11/17/20 10:12 AM, Lennart Poettering wrote: > On Mo, 16.11.20 21:48, Petr Menšík (pemensik@xxxxxxxxxx) wrote: > >> But it does not have to learn everything about a server, because it >> switched the active one. If it has to, try to find way to store server >> instance features per server IP, not per link. > > We do exactly this. But we also have a grace period after which we > forget everything again, and go back to the best server feature level again, > which then takes some time to settle back on the actual server feature > level. I think smart RTT checking every few minutes would be better, than forgetting everything about server always responding quickly. > > When we always use the same server, then we probe it once, and use it > for the full grace period without any further delays. When we use > numerous servers, and a different one for each lookup, then this means > we have to probe each and every single one of them once and that slows > down things. You can cache average response time for the server. If answer did not arrive in triple or usuall RTT, send to other servers too. It should not be done for each request. Depends on how long grace period, right? > > It's a very easy calulation: if you use n=1 servers for 500 lookups > within the grace period, you experience 1 slow lookup in the worst > case that required the feature probing, plus 499 speedy lookups > because we already knew the earlier probing results. If otoh you use > n=250 servers for 500 lookups you experience 250 worst case slow > lookups, since we need to learn for each server individually what it > can and can't do, plus 250 speedy lookups. And then, after the grace > period is over, you get another 250 slow lookups... I would think you can detect almost everything from 2 requests and replies. Only special workarounds for broken implementations may take more. It could work by timeout not too visible. It would send few queries to secondary server, metering averate round-trip time. If it was significantly better than the current server, switch to it. Unless connection changed, you can expect previous quirks apply, but still you can test response time. I think dnsmasq sends ocassinally few requests to non-primary. If answers don't arrive withing hundreds of ms, try also others. So user has answer soon and it did measure response time of all used servers. Even if any of them is offline. It does not have to be 50% of queries just to test it. > >>> It might be something to add as opt-in, and come with the warning that >>> you better list DNS servers that aren't crap if you want to use that, >>> so that we never have to downgrade protocol level, and thus the >>> learning phase is short. >> >> Sure enough, many router DNS implementations are bad or ugly. If it can >> choose from full featured validated ISP resolver and crappy router >> implementation, prefer the one with better features. Most likely it is >> much better maintained as well. > > I am sorry, but that is not a suitable approach for an "edge" DNS > clients. We need to go through the DNS server info we acquired through > DHCP or so, since private domains do exist. We must keep router admin > pages accessible. I haven't mentioned using any external DHCP client. Problem is, there is no standard way to configure slit DNS over DHCP. Or even declare private domains used by local networks via autoconfiguration. I tried Mikrotik's 'router' name on rawhide container. # dig @127.0.0.53 router gives status: NXDOMAIN # dig @127.0.0.53 router. gives status: NXDOMAIN /etc/resolv.conf points to /run/systemd/resolve/resolv.conf but getent can deliver results. How is that possible? # getent hosts router 192.168.88.1 router # grep hosts /etc/nsswitch.conf hosts: resolve [!UNAVAIL=return] myhostname files mdns4_minimal [NOTFOUND=return] dns Of course, Mikrotik supports only DNS, not LLMNR or mDNS. Are you sure you don't support your router's bogus domain, but forgot about other vendors? What I am saying is, resolved might choose only the second server from DHCP list. Where only first knew about private domains for example. It is kind of broken configuration, but not uncommon. Router's DNS might be out of date, vulnerable to various attacks, but would be first IP on DNS servers from DHCP. Second would be ISP's resolver with up-to-date security backups. Now, systemd-resolved understands DNS servers as a set, but not ordered list, right? Would and should it choose ISP's server or my home router? What are checked metrics? > > Hence, the "server spread" thing where queries are spread over a ton > of DNS servers only really works if configured for the manual opt-in > case. It's not something we could ever deploy by default. By default > we need something that works, doesn't break private domains, and isn't > slow. I agree, but just partially. Original behaviour of nss_dns.so plugin was to always keep order of used DNS servers constant. I am not sure this is possible to prefer first server and use following one only in case first is responding slow or never. I think default mode should be strict ordered requests for best compatibility. Which is not default now, correct? Cheers, Petr -- Petr Menšík Software Engineer Red Hat, http://www.redhat.com/ email: pemensik@xxxxxxxxxx PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
Attachment:
OpenPGP_0x4931CA5B6C9FC5CB_and_old_rev.asc
Description: application/pgp-keys
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature
_______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx