Re: default local DNS caching name server

Paul Wouters <paul@xxxxxxxxx> · Sun, 13 Apr 2014 12:16:35 -0400 (EDT)

On Sun, 13 Apr 2014, William Brown wrote:

Yes. It depends on the "trustworthiness" of the network and or
preconfiguration of some of your own networks you join.

Not really: Every network you join, you have to semi-trust. If you don't
trust it, why did you join it?

You don't always control which networks your device roams on. If I agree
to starbucks at my street, my phone will connect to any network named
starbucks, even if it is yours. So to draw the line between the user
_knowingly_ joining a network, we drew the line at "plug in physically
or provided the authentication credentials".

Works reasonably well with unbound+dnssec-triger, could use better NM
integration for captive portals.

But you can't account for every captive portal in the world. This is why
the cache is a bad idea, because you can't possibly account for every
system that is captive like this.

Yes we can by monitoring for "captivity signs" when a new network is
joined. Again, please yum install dnssec-trigger on your laptop and
start the dnssec-trigger applet once, and go have a coffee outside.
Let us know your experience.

Case 2: Moderate home user. They have a little knowledge of DNS, and
have setup a system like OpenWRT or gargoyle on their router. They have
their own zone, .local . This means that their DHCP provides the DNS ip
of the router to clients.

Same if their wifi is closed (eg WPA2), will need an exception in NM if
their wifi is open for the .local forward.

What if I call my network .concrete. Or .starfish. Or any other weird
thing I have seen on personal networks. Again, you cannot bypass the
local network DNS as the forwarder. You must respect it.

We will! If your DHCP has:

			option domain-name-servers	10.1.2.3;
			option domain-name "starfish";

Then unbound would get a forward configured to use 10.1.2.3 for the domain .starfish,
basically calling:

sudo unbound-control forward_add starfish 10.1.2.3.
sudo unbound-control flush starfish
sudo unbound-control flush_requestlist

When you leave the network, forward_remove is called.

sudo unbound-control forward_remove starfish
sudo unbound-control flush startfish
sudo unbound-control flush_requestlist

When connecting to their LAN or secure wifi, same as above for one one
forwarding zone. Multiple forwarding zones would need to be configured.
It if is an enterprise, they might need their corporate CAs as well as
their zones configuration, so a corporate rpm package would make sense.

How do you plan to make this work? You can't magically discover all the
DNS zones hosted in an enterprise. At my work we run nearly 100 zones,
and they are all based at different points (IE, a.com, b.com, c.com.)
You cannot assume a business has just "a.com" and you can forward all
quieries for subtree.a.com to that network server.

If you are that large a business, you should really have a corporate
build rpm package with your enterprise information such as local CA,
local zones, etc. DNS forwarder zones can be dropped into
/etc/unbound/*.d/ currently. I would expect we would make this software
neutral via NM integration, where an NM unbound plugin would use those
directories. We could add a per-network option that specifies to use a
forward for "." (everything) instead of just the DHCP specified domain,
or perhaps even do this for trusted (see above) networks.

However, that should not be the default for open wifi networks for
security reasons.

Again, you *must* respect the DHCP provided DNS server as the forwarder
else you will savagely break things.

And not doing anything will cause people to have insecure DNS. So I think
the question should be turned around a little bit. There is a need for
DNSSEC on the end nodes - how can we best facilitate that while trying
to be as supportive of current deployments as we can be? That is what
we are trying to do. If you only counter with "I require insecure DNS
for my network to function" or "all cache is evil", than you are not
openminded enough to the realities of the requirement of DNSSEC support.

Same, already works if you only need the one domain that is negotiated
via the VPN (eg the IKE XAUTH domain).

You can negotiate more than one domain on a VPN .... again, see above.

not with IPsec/XAUTH. If more domains can come in via openvpn or
something, that I would assume the existing openvpn unbound plugin
already deals with that case. If not, please file a bug and we will fix
it.

We are not suggesting that for LAN or secure wifi. In those cases the
forward will be added. However, you don' want those forwards for open
wifi or else I can bring up "linksys" push you a forward for your
internal.domain.com and mislead you into thinking you would be going
over your VPN.

This is a more serious problem, than a caching resolver could hope to
solve as it shows malicious intent.

I'm sorry I don't understand what you are trying to say here.

Case 1: The user doesn't know much about DNS. the ISP might be reliable
or unreliable. If we assume as discussed that the cache is flushed on
network change, they will have an empty cache.

The cache is never fully flushed. It is only flushed for the domain
obtained via DHCP or VPN, because those entries can change. They are not
changed for anything else. If the upstream ISP could have spoofed them,
so be it - the publisher of the domains could have used DNSSEC to
prevent that from happening.

No no no!!!! You need to flush *all* entries. Consider what I resolve
www.google.com to. That changes *per* ISP because google provides
different DNS endpoints and zones to ISPs to optimise traffic! So when I
use google at work, I'm now getting a suboptimal route to their servers!

google publishes TTLs for that which are honoured. If google requires
different records when you switch ISPs, they need to use shorter TTLs.
The publisher decides here, not the consumer. Additionally, to resolve
these issues, there is a new draft that has been implemented by some
(such as opendns which specifically has this problem at a large scale):

https://tools.ietf.org/html/draft-vandergaast-edns-client-subnet-02

So I consider this a solved problem, even if code and deployment is not
there yet at this moment.

So that's a valid point: A non-caching unbound that caps TTLs is a good
idea, but as you say, you can't stop a dodgy ISP.

Actually you can! A captive hotspot is not much different from a dodgy
ISP. unbound tries its best to not use any DNS server that messes with
DNS. So ISPs like Rogers who like to rewrite DNS packets are explicitely
not used by unbound - it prefers to become a full recursive server
without offloading to any forwarder if the forwarder is that malicious.
We even run DNS resolvers as Fedora infrastructure that provides DNS
over TCP-80 and DNS over TLS-443 as alternatives to work around these
broken ISPs that also block port 53 in an attempt to force you to use
their DNS lies.

Case 2: The user does know a bit. But when they change name records they
may not be able to solve why a workstation can't resolve names like
other clients.

While we could flush the entire cache on (dis)connect, I think that's
rather drastic for this kind of odd use-case. If the user runs their own
zone and their own records, they should know about DNS and TTLs. But
even so, NM could offer an option to flush the DNS cache.

But this isn't even an odd use case. There are enough power users in the
world who do this. It's not just computer enthusiasts, I know a chemist
who did this, and others. You can't just assume a generic case, and then
break it for others.

If you are changing DNS records, you need to understand TTL and cache
flushing. If you don't than sure, you can be the clueless windows user
that reboots their machine. I care much more about some of the more
realistic use cases of fedora machines connected over 3G, where latency
matters and flushing the entire cache would cause both more traffic and
more latency. And things like pre-fetching where we renew cached DNS
entries that are still being served from cache, to avoid the outage when
the record expires.

Case 3: This user does understand DNS, and they don't need DNS cache.

That depends. You need caching for DNSSEC validation, so really, every
device needs a cache, unless you want to outsource your DNSSEC
validation over an insecure transport (LAN). That seems like a very bad
idea.

If your lan is insecure, you have other issues. That isn't the problem
you are trying to solve.

Yes it is. When I'm at the coffee shop, my LAN is insecure. I don't want
to trust DNS answers coming in. I want to validate those using DNSSEC
on my own device. So I need to run a validating recursive (caching) nameserver
for very valid security reasons - so that the guy next to me cannot spoof
paypal.com.

They have bind / named setup, and they would like to rely on that
instead.

They can. DNS caches are chained. There is no reason to say you cannot
run your own cache and have a network based cache.

But you don't *need* it. I went to efforts to setup my own bind to
cache, I shouldn't need it on my system. Again, local caches cause all
kinds of issues. A home user is likely to toy with things and set a
high-ish ttl, say even 10 minutes, and change records on their server.
Then their records appear broken, because the local cache isn't expired
yet.

See above where the same argument was discussed. But also, you would
have the exact same problem on many devices on your network that won't
throw away that DNS record immediately. In-browser caches, OSX system wide
cache, and who knows what your PVR, game console and TV do these days. If
this worked for you in the past, you were lucky AND you engineered this
to work. If you handed that solution to you unknowledgable chemist, it's
time to update their solution to meet the modern demands of facilating
to use DNSSEC on every device.

When they change records in their local zones, they don't want
to have to flush caches etc. If their ISP is unreliable, or their own
DNS is unreliable, a DNS cache will potentially mask this issue delaying
them from noticing / solving the problem.

This is becoming really contrived. Again, if you think this is a real
scenario (I don't think it is) than you could run unbound with ttl=0.
But a requirement of automagically understanding what a local zone is
and automagically understanding when a remote authoritative dns server
changes data, and not willing to enforce that with ttl=0, and using
that as argument why any solution of unbound to provide a security
feature (DNSSEC) is getting a little unrealistic. If you want your
laptop to start validating TLSA and SSHP and OPENPGPKEY records, you
need DNSSEC validation on the device. The question should be "how do you
change your network requirements to meet that goal". Yes, enforcing
security comes at a price.

It's not contrived: This is a common network setup for all the people I
know who are enthusiasts or how they setup their home networks. This is
why it's a use case.

I suggest you keep a close eye on the IETF HOMENET people, because
DNSSEC is coming into your home automation one way or the other, and if
you depend on this system, you will run into trouble into the future.

Let me use your scenario based on TLS. You want to be able to change
your TLS certificates and the private CA you regenerate at any time,
without any browser on your network ever giving you a popup warning.
You know you cannot ask this - it goes against the security model. The
same applies for DNS with DNSSEC. The security demands we need to do
validation and caching and we should try to make that as flexible and
painless as possible.

The issue is that by adding DNSSEC in this way, you are going to cause a
great deal of pain because these caches. Add DNSSEC, but if you need to
cache, cache for the most minimal time possible.

As I argued in the last few days, I do not see this "great deal of
pain" and I've provided an unbound workaround for you, and your corner
case can be dealt with via a new NM option.

It's linked to the other cases. It's the point that local system caches
aren't needed as you have access to highly reliable DNS systems.

You will just have to come to term with the fact that caches are needed
when you are doing constant DNSSEC validation. So your argument that
caches are not needed might have been true in the past, but is no
longer. Now let's work on ensuring your exception cases can be supported
in the precense of caches.

Additionally, business networks are "trusted" so you can trust their DNS
caches etc. (to a point)

Business networks are never compromised? But as I stated, we already
said we will do the forward using the DHCP supplied nameserver in case
of a LAN or secured WIFI connection.

Case 8: Vpns are a bit unreliable, and have relatively high(ish)
latency. But mostly they are quite good, ie openvpn. DNS cache *might*
help here in case of traffic loss. Again, this would be masking a
greater issue though, and could be better solved with TCP dns queries
rather than UDP.

The VPN cases aleady work very well in Fedora. I seamlessly connect and
disconnect from the redhat VPN. Resources that are available only via
the VPN are never blocked by wrong DNS cache I got from when the VPN was
down. VPNs are a non-issue.

Consider a business with external and internal DNS zones. This becomes
an issue in this case. If you have cached say "website.example.com" to
the external IP, and that is DMZed somehow on the internal network, when
you change to VPN, you need to use the internal view of that zone
instead. But you can't the name is cached.

Which is why we flush the cache for the domain in question when we
detect a network change. See the above unbound commands used. This is a
solved problem. Every day, when my VPN is up I reach bugzilla.redhat.com
on its internal IP, and when my VPN is down I reach bugzilla.redhat.com
on its external IP. Without any manual intervention. It just works.

No, cache is not a feature. It's a chronic issue.

Then please let us know what you intend to replace DNS with. The reason
DNS has worked for over 20 years is because it is a caching system.

Look at windows
systems that service desks around the world always advise the first step
is reboot: Why? Flush dns caches (Or other things). When you can't get
to a website? Restart the webbrower, to flush the cache. Intermittent
network issues for different people on a network? The cache is allowing
some people to work, but masking the issue to them. It's not allowing
people to quickly and effectively isolate issues.

If DNS cache was the only cause for Windows machines to need a reboot,
I'm sure Microsoft would have fixed that by now. Let's remain honest
here and say there are a 1001 reasons why Windows users reboot their
machines. DNS might be one of them but it has no relationship to the
discussion we are having right now.

DNSSEC is a good idea: Caches are a problem.

We disagree.

If this really is to be used, I cannot stress enough, that a cache must
be completely flushed every time the default route or network interface
changes. You can't, and I can't possibly conceive every network setup in
the world. If you make assumptions like this, systems will break and
fedora will be blamed.

Consider some of the options I suggested for addition to NM to accomodate
your scenario, or suggest alternatives. If you are believe the only
solution is "no cache ever", than there is not much more we can talk
about. And if the majority of fedora users prefers an insecure no-cache
over a DNSSEC-cache solution, I guess I will go elsewhere and stop
running Fedora.

Paul
--
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct