Re: default local DNS caching name server

William Brown <william@xxxxxxxxxxxxxxx> · Mon, 14 Apr 2014 09:42:05 +0930

> > But you can't account for every captive portal in the world. This is why
> > the cache is a bad idea, because you can't possibly account for every
> > system that is captive like this.
> 
> Yes we can by monitoring for "captivity signs" when a new network is
> joined. Again, please yum install dnssec-trigger on your laptop and
> start the dnssec-trigger applet once, and go have a coffee outside.
> Let us know your experience.

What is a "captivity-sign" as you so put it? 

> >
> > What if I call my network .concrete. Or .starfish. Or any other weird
> > thing I have seen on personal networks. Again, you cannot bypass the
> > local network DNS as the forwarder. You must respect it.
> 
> We will! If your DHCP has:
> 
>  			option domain-name-servers	10.1.2.3;
>  			option domain-name "starfish";
> 
> Then unbound would get a forward configured to use 10.1.2.3 for the domain .starfish,
> basically calling:
> 
> sudo unbound-control forward_add starfish 10.1.2.3.
> sudo unbound-control flush starfish
> sudo unbound-control flush_requestlist
> 
> When you leave the network, forward_remove is called.
> 
> sudo unbound-control forward_remove starfish
> sudo unbound-control flush startfish
> sudo unbound-control flush_requestlist

Okay, so lets expand this to my workplace, that run's a University
network. We have thousands of students connected. Now, we have many
zones on our network, from services.university.edu university.edu,
medicalcenter.org, ersearch.com etc etc.

We can't possibly put all of these into our "domain-name" dhcp option.
iirc it's a single value attribute anyway. 

So how does unbound handle this? Does it bypass my network DNS servers
completely for everything that isn't university.edu or a child of? IMO
that's not acceptable behaviour. 

> 
> >> When connecting to their LAN or secure wifi, same as above for one one
> >> forwarding zone. Multiple forwarding zones would need to be configured.
> >> It if is an enterprise, they might need their corporate CAs as well as
> >> their zones configuration, so a corporate rpm package would make sense.
> >
> > How do you plan to make this work? You can't magically discover all the
> > DNS zones hosted in an enterprise. At my work we run nearly 100 zones,
> > and they are all based at different points (IE, a.com, b.com, c.com.)
> > You cannot assume a business has just "a.com" and you can forward all
> > quieries for subtree.a.com to that network server.
> 
> If you are that large a business, you should really have a corporate
> build rpm package with your enterprise information such as local CA,
> local zones, etc. DNS forwarder zones can be dropped into
> /etc/unbound/*.d/ currently. I would expect we would make this software
> neutral via NM integration, where an NM unbound plugin would use those
> directories. We could add a per-network option that specifies to use a
> forward for "." (everything) instead of just the DHCP specified domain,
> or perhaps even do this for trusted (see above) networks.
> 
> However, that should not be the default for open wifi networks for
> security reasons.
> 

See above: We can't possibly hope to deploy such a package to students
and staff with bring-your-own-device. How do you propose we populate all
the needed forwarders for our students (Such, maybe only a few hundred
use linux / fedora - but it will cause them to have a negative view of
the OS. 

> 
> > Again, you *must* respect the DHCP provided DNS server as the forwarder
> > else you will savagely break things.
> 
> And not doing anything will cause people to have insecure DNS. So I think
> the question should be turned around a little bit. There is a need for
> DNSSEC on the end nodes - how can we best facilitate that while trying
> to be as supportive of current deployments as we can be? That is what
> we are trying to do. If you only counter with "I require insecure DNS
> for my network to function" or "all cache is evil", than you are not
> openminded enough to the realities of the requirement of DNSSEC support.
> 

Sure, lets agree we "need" dnssec, and that follows that we need cache.

Set cache times to be deliberately low so that silly network admin's
don't break things (Even 300).

Don't try and by pass the local network DNS: There are more network
configurations in the world than you or I can contemplate, and bypassing
this *will* break things for people. 

> >>> Case 1: The user doesn't know much about DNS. the ISP might be reliable
> >>> or unreliable. If we assume as discussed that the cache is flushed on
> >>> network change, they will have an empty cache.
> >>
> >> The cache is never fully flushed. It is only flushed for the domain
> >> obtained via DHCP or VPN, because those entries can change. They are not
> >> changed for anything else. If the upstream ISP could have spoofed them,
> >> so be it - the publisher of the domains could have used DNSSEC to
> >> prevent that from happening.
> >
> > No no no!!!! You need to flush *all* entries. Consider what I resolve
> > www.google.com to. That changes *per* ISP because google provides
> > different DNS endpoints and zones to ISPs to optimise traffic! So when I
> > use google at work, I'm now getting a suboptimal route to their servers!
> 
> google publishes TTLs for that which are honoured. If google requires
> different records when you switch ISPs, they need to use shorter TTLs.
> The publisher decides here, not the consumer. Additionally, to resolve
> these issues, there is a new draft that has been implemented by some
> (such as opendns which specifically has this problem at a large scale):
> 
> https://tools.ietf.org/html/draft-vandergaast-edns-client-subnet-02
> 
> So I consider this a solved problem, even if code and deployment is not
> there yet at this moment.

See also my comments about internal and external zones on a network. 

If you want to cache, then you can't assume that what I cache on network
A will be valid on network B. Consider the home user with the dodgy ISP
that set's all TTLs to say 30 days. Do you want that user to take that
cached entry to a working network and be using that cache for 30 days?
(Or whatever unbound sets it TTL max to.) 

> 
> > So that's a valid point: A non-caching unbound that caps TTLs is a good
> > idea, but as you say, you can't stop a dodgy ISP.
> 
> Actually you can! A captive hotspot is not much different from a dodgy
> ISP. unbound tries its best to not use any DNS server that messes with
> DNS. So ISPs like Rogers who like to rewrite DNS packets are explicitely
> not used by unbound - it prefers to become a full recursive server
> without offloading to any forwarder if the forwarder is that malicious.
> We even run DNS resolvers as Fedora infrastructure that provides DNS
> over TCP-80 and DNS over TLS-443 as alternatives to work around these
> broken ISPs that also block port 53 in an attempt to force you to use
> their DNS lies.

But you can't really tell what's a dodgy DNS and what's not. There are
plenty of good ISP's with well configured DNS systems that you *should*
use as a forwarder. Again, you can't determine what zones exist in this
DNS server so that you can use it "just for those" and bypass it for all
else. 

Consider also, that some ISP's force all port 53 traffic to their own
DNS servers too. How does unbound know when the ISP is forcing this?

Essentially, what I'm hearing at the moment, is that the proposal isn't
just a caching DNS server: It's a DNS server that will be:

* DNSSEC
* Caching
* Attempts to always bypass my local DNS forwarder. 

> 
> >>> Case 2: The user does know a bit. But when they change name records they
> >>> may not be able to solve why a workstation can't resolve names like
> >>> other clients.
> >>
> >> While we could flush the entire cache on (dis)connect, I think that's
> >> rather drastic for this kind of odd use-case. If the user runs their own
> >> zone and their own records, they should know about DNS and TTLs. But
> >> even so, NM could offer an option to flush the DNS cache.
> >
> > But this isn't even an odd use case. There are enough power users in the
> > world who do this. It's not just computer enthusiasts, I know a chemist
> > who did this, and others. You can't just assume a generic case, and then
> > break it for others.
> 
> If you are changing DNS records, you need to understand TTL and cache
> flushing. If you don't than sure, you can be the clueless windows user
> that reboots their machine. I care much more about some of the more
> realistic use cases of fedora machines connected over 3G, where latency
> matters and flushing the entire cache would cause both more traffic and
> more latency. And things like pre-fetching where we renew cached DNS
> entries that are still being served from cache, to avoid the outage when
> the record expires.
> >>> Case 3: This user does understand DNS, and they don't need DNS cache.
> >>
> >> That depends. You need caching for DNSSEC validation, so really, every
> >> device needs a cache, unless you want to outsource your DNSSEC
> >> validation over an insecure transport (LAN). That seems like a very bad
> >> idea.
> >
> > If your lan is insecure, you have other issues. That isn't the problem
> > you are trying to solve.
> 
> Yes it is. When I'm at the coffee shop, my LAN is insecure. I don't want
> to trust DNS answers coming in. I want to validate those using DNSSEC
> on my own device. So I need to run a validating recursive (caching) nameserver
> for very valid security reasons - so that the guy next to me cannot spoof
> paypal.com.

DNSSEC doesn't solve the coffee shop problem: You're still on open
wireless and there are plenty of other attacks you are still vulnerable
too.

Sure it helps. But this is DNSSEC helping, not the cache. 

> 
> > Look at windows
> > systems that service desks around the world always advise the first step
> > is reboot: Why? Flush dns caches (Or other things). When you can't get
> > to a website? Restart the webbrower, to flush the cache. Intermittent
> > network issues for different people on a network? The cache is allowing
> > some people to work, but masking the issue to them. It's not allowing
> > people to quickly and effectively isolate issues.
> 
> If DNS cache was the only cause for Windows machines to need a reboot,
> I'm sure Microsoft would have fixed that by now. Let's remain honest
> here and say there are a 1001 reasons why Windows users reboot their
> machines. DNS might be one of them but it has no relationship to the
> discussion we are having right now.

That's deflecting the point. The first advice when you can't access some
website or service foo, is to reboot for this reason. 

> 
> > DNSSEC is a good idea: Caches are a problem.
> 
> We disagree.
> 
> > If this really is to be used, I cannot stress enough, that a cache must
> > be completely flushed every time the default route or network interface
> > changes. You can't, and I can't possibly conceive every network setup in
> > the world. If you make assumptions like this, systems will break and
> > fedora will be blamed.
> 
> Consider some of the options I suggested for addition to NM to accomodate
> your scenario, or suggest alternatives. If you are believe the only
> solution is "no cache ever", than there is not much more we can talk
> about. And if the majority of fedora users prefers an insecure no-cache
> over a DNSSEC-cache solution, I guess I will go elsewhere and stop
> running Fedora.

I'm glad that the NM integration is being considered, that will help. I
might not be afraid to touch a CLI, but I do think of users who use the
GUI only.

I think that at the end of the day, there are just too many network
setups than we can both contemplate. Some of them will be ready for
DNSSEC and systems like unbound when the time comes (As stubborn as I
may seem, When it becomes default, I will try and make my systems work
seamlessly with the OOB defaults)

Consider how many networks don't advertise all their domain names via
DHCP. How many networks have more than one zone that unbound can't
magically discover. How many networks have split views. These are all
reasons to flush the cache on interface state change, because the world
won't magically make their networks "perfect" for fedoras sake. We need
to work in a world of various configurations ranging from sane to
insane. 

DNS as a caching system has worked, because the caches on networks don't
move. They have one view of the world and they don't move. If you have a
laptop or other system that moves around, and you take that view of the
DNS world with you, things will become screwy and might break in subtle
ways that ordinary users can't explain.

In summary, all I ask is that:

* If a forwarder exists on the network, unbound uses it for all queries.
(You can't know all the internal zones it's holding, and often they are
not all advertised).

* If that forwarder returns an invalid signed DNSSEC zone then you
bypass it for only that zone. (IE the zone is being tampered with)

* Unbound flushes it's cache between interface state changes, because
you are moving between networks with different DNS views of the world. 

* That you keep the DNS cache time short, to help avoid issues with DNS
admins who forcefully increase TTLs. Consider google, with the TTL of
300. Perhaps even set each cached record to have a cache time of ttl or
3600 which ever is lower. 

Im trying to think about the "user experience" of fedora here rather
than a technically perfect world. These suggestions will eliminate all
the concerns I have with this system and would hopefully make the
default experience better. :)

-- 
William Brown <william@xxxxxxxxxxxxxxx>
Attachment:
signature.asc

Description: This is a digitally signed message part
-- 
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct