Re: This is bad, was Re: Fedora 33 System-Wide Change proposal: systemd-resolved

Paul Wouters <paul@xxxxxxxxx> · Tue, 29 Sep 2020 11:21:01 -0400 (EDT)

On Tue, 29 Sep 2020, Lennart Poettering wrote:

Well, but how do you determine "local resources"?

This is not the proper question. The proper question is "what are you
trying to do". The .local domain discovery clearly is something meant
to be local.

I assume the real question is: How to convey my custom local network
domain to my local infrastructure. In the old days, it was what DHCP
gave you as domain. If you do that with your own network, then it is
pretty obvious. If you do it differently, you will have to coney this
somehow via configuration. This is why 20 years ago Microsoft added
"zones" to their network configuration. Is this a "home zone" or a "work
zone" or a "public wifi".

So I expect the information of "what is local" to live in
NetworkManager or systemd-networkd via configuration.

No further magic should be needed. The user selects this once when
joining a new network.

Corporate networks tend to define local zones. Home wifi routers all
do, too. There's no clear way to know what can go directly to well-known
good DNS servers and what needs to be resolved locally.

Generally, resolve the names from the DHCP obtained domain name with
the DHCP obtained name servers. Yes, this is limited to one domain name,
which might not be ideal, but in general when you connect to a home or
corporate network directly (no VPN) then you should use their DNS
servers regardless. Enterprise is likely blocking port 53 (or DoT or
trying to block DoH) for security reasons. And your home network you
trust?

Most home routers these days allow configuration of a guest network
along with the native home network. For those not requiring services
on the home network, and who just need internet. It is the same as
using a public wifi in a coffeeshop or guest network at an enterprise
network. You might need to authenticate a captive portal and then you
should not trust the network for anything else and ideally only give
it encrypted packets (TLS, DoT to trusted DNS servers, VPN). If no
trusted DNS servers are configured on your device, you have no choice
but to trust their DNS servers.

For what the user deems is a "public wifi", there are simply never
any "local resources" other than an internet uplink to your own
remote resources.

In all the above scenario's, I see no ambiguity on which DNS servers
to use, except when multiple domain names exist within only the LAN,
which is rarely the case.

For the VPN scenario, it is just a little bit more complicated.

For those with proper standards, such as "Cisco IPsec", L2TP/IPsec",
the VPN confiuration is dictated by the server to either send all or
some traffic to the VPN server. If it is not "everything", then these
VPNs convey 1 domain name and one or more IP's of DNS servers to use
to resolve that domain.

For IKEv2 IPsec based VPNs, any number of domain names can be specified
by the server to be used by the client. When doing split-DNS with DNSSEC
trust anchors, these can be conveyed and there are strict rules on when
to allow these to override public DNSSEC trust anchors as per RFC 8598.

For VPN protocols with no real standard, things are more complicated.

OpenVPN can do custom things. It all depends on the provisioning.

WireGuard has nothing related to DNS, it is all hidden in the per-vendor
proprietary provisioning code. Perhaps the "wg-dynamic" userland
protocol will address this. Let's hope they read RFC 8598 for
inspiration to avoid the mistakes of IPsec 20 years ago.

What is important with all of the VPN cases is that you properly flush
the cache when the VPN estalishes and terminates, so avoid having
unreachable IP's in your DNS cache. It's important not to flush other
DNS data to avoid DNS fingerprinting users across different networks.

It seems resolvectl is the API to support this with systemd-resolved.

In short, I don't understand the issue raised here of "How do you
determine local resources".

For each and every domain name in the above scenario it is obvious what
nameserver to send it to. There is never a need to broadcast this over
a mix of public / private DNS servers.

Also, people would react very allergic if we'd start sending all DNS
traffic to google or so.

So this feature has no purpose as far as I can see and is never ever a
good idea, unless the user is specifically told their choice is to
disconnect from a broken network or try to use the broken network with
well known public DNS servers as a last resort.

Yes, resolved implements DNSSEC. But from my experience I can tell you
it's very hard to do in a way resonably compatible with DNS servers
deployed out there in particular edge ones. Things mostly work, but
DNS servers are all broken in different ways, and we can't possibly
test things on all possible cheap wifi hw...

Which is why the DNSSEC validation code should have been left to the
large DNS teams at ISC, NLnetlabs, nic.cz, powerdns, IETF/ICANN
communities etc. For any of the problems that systemd-resolved
claims to have been written for - determing when and where to send
which DNS queries to - is completely unrelated to DNSSEC and its
deployment/implementation protocol interop issues and corner cases.

It was never required that systemd-resolved use its own DNSSEC validation
code. I warned not to do this. The DNS community spends tens of millions
of dollars a year on writing and maintaining DNS libraries and deamons
and do protocol updates.

libreswan based its VPN DNS reconfiguration on the unbound daemon and
libunbound. This work actually collaborated with NLnetlabs to extend
unbound for all the VPN use cases to reconfigure the DNS server for
all kinds of VPN domain scenario's.

FreeBSD has started using unbound with their own unbound reconfiguration
tooling around it.

systemd-resolved resources were spend re-inventing DNSSEC
implementations, making many of the same mistakes that the existing DNS
libraries made, and is still buggy resolving certain complicated CNAME
and wildcard scenario's with NSEC3. This is not because systemd-resolved
programmers are bad. It is because implementing and maintaining DNSSEC
is a million dollars a year operation. This money results in 3-4
production quality well maintained DNSSEC implementations that Linux
can choose from. systemd-resolved simply does not have the resources
to do this themselves, as is evident by the 1300 open bugs on github
right now.

systemd-resolved should use an existing DNSSEC library. It can open a
seperate DNS cache to each of the interface's supplied DNS servers.
It can route DNS queries to the proper DNS cache. It will automatically
get fixes and new record types supported by updates to thse DNS
libraries.

systemd-resolved should focus on what it needs to do. Learn and
reconfigure the stream of DNS queries to the right servers. It should
get out of the DNS resolving and DNSSEC validation and DNS caching
business.

(One thing I definitely want to add is an option to only do DNSSEC if
DoT is also done, under the assumption that a DNS server that is good
enough and new enough to implement the latter also should be able to
do the former sanely.)

That assumption might be true now, but 5 years down the line there will
be bugs and corner cases and not enough resources for systemd-resolved
to track and handle this.

Also, the "only do DNSSEC if" is not a valid choice. Let's remember this
whole thread started with my system getting broken because DNSSEC was
silently dropped by systemd-resolved after a system upgrade.

No, it's not. It's extremely difficult. Cheap wifi router DNS servers
are broken in so many ways. They return errors in some cases, freeze
in others, return rubbish in others, or not at all in even others. If
you ask the wrong questions anything can happen.

This is why systemd-resolved should use a DNS library and not invent its
own thing. The teams at ISC, NLnetlabs, NIC.CZ, PowerDNS have spend the
last 20 years dealing with this and solving it. Use their code. You
don't have the resources to do this yourselves. Again, 1300 open bugs
on github show you have never managed to dig out of this hole.

We pretty carefully
tests and probe DNS servers but this still comes at the price that on
a particular bad implementation we might take a long time until we
figure out that DNSSEC simply is not possible.

See above. Also, the fix you applied now is to disable DNSSEC per
default, damaging all the installed servers on enterprise networks
that depend and receive completely valid DNSSEC traffic. So I am
sorry if I strongly disagree with "pretty carefullt test and probe".
That's not what happened to my laptop as VPN client and  my mail server.

The simple fact that some DNS servers don't respond at all if you ask
the "wrong" questions is already a problem: it means you have to wait
for a timeout (which means super long lookups initially) or do queries
in parallel. That however is a problem too since other DNS servers
really don#t like it if you ask them multiple questions at
once. Bombarding DNS servers with multiple questions all at once and
see if one "sticks" isn't a workable strategy hence either.

Stop re-inventing the wheel. Bind, unbound, knot, powerdns do this
with much more resources that you have, and for many more years
than you have and they are far more aware of these issues then you
are as they see a vastly larger audience with issues that the Linux
desktop niche market. When systemd-resolved on github closes a bug
reported and explained by Mark Andrews of Bind, the result is a bug
in systemd-resolved.

So I think we do quite well in resolved on the DNSSEC front actually,

Compared to the dedicated DNS teams at the mentioned opensource DNS
software, systemd-resolved is not doing quite well. It is doing poorly.
Its developers are not attending the DNS conferences where issues are
discussed. They are not at IETF, not at ICANN, not at DNS-OARC, not
at RIPE. I have never seen systemd-resolvd people participating in the
wider DNS community. A community of hundreds of DNS engineers.

So let me ExecSum what I wrote here. For systemd-resolved to become
a high quality DNS solution:

1) Remove custom DNS/DNSSEC resolving code and use a well maintained
   DNS library.
2) Maintain a per interface DNS cache using these libraries
3) Use the above sketched out process to improve your process of
   deciding which interface to send the query to. This is the core
   of what systemd-resolved should give to the user. It is probably
   already pretty close to this when we work on integrating VPN supprt.
4) Deal with hotspots separately
5) Support user configured/prompted fallback using DoT and DoH to well
   known servers in case obtained DNS servers are too broken to work
   well (with DNSSEC)

No one else but systemd-resolved has item 2) and 3) and we only had a badly
working dnssec-trigger that tried to do this. This is where systemd-resolved can shine.

I would seriously FALL IN LOVE with systemd-resolved for doing 2) and 3)
even if I had to sometimes manually do 4) and 5)

I will work on extending 3) with VPN support in libreswan for IKEv1 and
IKEv2 based IPsec VPNs.

But 1) is crucial to widespread voluntary adoption. Without 1) we have
no choice to allow the user to completely disable/remove
systemd-resolved from their system.

Paul
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx