Re: [Last-Call] [dns-privacy] Last Call: <draft-ietf-dprive-rfc7626-bis-04.txt> (DNS Privacy Considerations) to Informational RFC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 22 Jan 2020, at 09:39, Rob Sayre <sayrer@xxxxxxxxx> wrote:

Hi,

Here are my last-call comments on draft-ietf-dprive-rfc7626-bis.


Thank you for your detailed 45 point review of this document during the second IETF last call.

18 of these points relate to text that is unchanged since the original RFC7626, I’ll mark these as [RFC7626].  To avoid repetition I’ll outline that since this text has already received two forms of consensus (IETF consensus on publishing RFC7626 and DPRIVE WGLC for this document) I am taking a conservative approach during the second IETF Last Call of not removing or making substantial changes to that text unless there is a good reason to do so. 




# 1.  Introduction
#

> It is one of the most important infrastructure components of the Internet
> and often ignored or misunderstood by Internet users (and even by many
> professionals).

This text is carried over from RFC 7626, but it doesn't seem grounded in fact.

[RFC7626] I don’t recall any other review of this document contesting this statement. 


> Because DNS relies on caching heavily...

Suggest rewriting this paragraph to avoid starting two sentences with
"Because". 

"DNS relies on caching heavily, so the algorithm described above is actually a
bit more complicated”

[RFC7626] I’m fine with this grammatical change.


> Because there is typically no caching in the stub resolver,

I think it's a stretch to call this "typical”.

[RFC7626] Suggest “s/typically/often/“

alternatively “s/no/no or only limited/"


The text then draws a parenthetical distinction between stub resolvers and 
"Applications" that is not grounded in fact:

> (Applications, like web browsers, may have some form of caching that does
> not follow DNS rules... 

[RFC7626] Can you clarify - are you disputing that they cache or that they don’t follow DNS rules?


This text doesn't seem helpful:

> At the time of writing, almost all this DNS traffic is currently sent
> in clear (i.e., unencrypted)...

Suggest striking, and combining with the following paragraph.

The document's focus is privacy concerns of the DNS for Internet users. Providing guidance on the deployed balance between clear text and encryption of DNS on the Internet seems useful. 


I also question the value of describing the concerns around TCP, QUIC etc.
It seems like anything published will be out of date by the time an RFC
appears. 

The DPRIVE working group spent a significant amount of time evaluating DTLS vs TLS vs STARTTLS on port 53 due to the concerns around backwards compatibility and scalability of encrypting the DNS. Switching a protocol from using UDP to a session based protocol for an infrastructure as large as the DNS is non-trivial. 

Even the citation currently in the draft:

"Today, almost all DNS queries are sent over UDP [thomas-ditl-tcp]"

is from 2014. 

Fair point. It is, however, for DITL data and since there is no standard for encryption to authoritative servers this is highly unlikely to have changed. The data is collected every year though, so I will contact the authors since they may have a more recent study. Also, I suggest adding a reference to:

An End-to-End, Large-Scale Measurement of DNS-over-Encryption: How Far Have We Come? https://faculty.sites.uci.edu/zhouli/files/2019/09/imc19.pdf


I would suggest a purely factual formulation. Something like:

"The original DNS RFCs describe an unencrypted protocol over UDP... newer
RFCs provide for DNS traffic over reliable transports and encryption
[various citations]”

As mentioned above, this document describes deployment and use of DNS in addition to mentioning protocol specifications. 

> Another important point to keep in mind when analyzing the privacy
> issues of DNS is the fact that DNS requests received by a server are
> triggered by different reasons. Let's assume...

Suggest:

"Multiple DNS requests can be triggered by a single user-initiated action.
Let's assume…"

[RFC7626] DNS requests can be triggered for many reasons other than user-initiated action. For example, many OSs and application use periodic DNS probing for connectively and do periodic DNS lookups to discover if software updates are available. 


Next in the text:

> Primary request: this is the domain name in the URL that the user
> typed, selected from a bookmark, or chose by clicking on an
> hyperlink.  Presumably, this is what is of interest for the eavesdropper.

The second sentence is not supported by fact and should be struck.

[RFC7626]  


The Introduction ends with this paragraph:

> It can be noted also that, in the case of a typical web browser, more
> DNS requests than strictly necessary are sent, for instance, to
> prefetch resources that the user may query later or when
> autocompleting the URL in the address bar.  Both are a big privacy
> concern since they may leak information even about non-explicit
> actions.  For instance, just reading a local HTML page, even without
> selecting the hyperlinks, may trigger DNS requests.

This text is redundant with the bullet points above. Suggest striking this
paragraph and adding "prefetch resources" to the list in the second bullet
point that has "_javascript_ code, embedded images, etc"…

It also suggests there's "a big privacy concern" when it's not clear that the
threat is any worse than all of the other things going on in a browser.

[RFC7626] The bullet points refer to DNS requests that are necessary to provide the web page content. This paragraph discusses additional queries that are not strictly necessary to provide that core functionality. The distinction seems useful from a privacy perspective.



#
# 2.  Scope
#

> This document does not attempt a comparison of specific privacy
> protections provided by individual networks or organizations, it
> makes only general observations about typical current practices.

Suggest striking this paragraph, as the part about "typical current
practices" can't be supported by fact.

Suggest retaining this text, since the DPRIVE working group reached consensus on it.


#
# 3.1.  The Alleged Public Nature of DNS Data
#

> It has long been claimed that "the data in the DNS is public".

This quote needs a citation if included. It is even contradicted by text in
Section 2: "leakage of private namespaces…”.

The later paragraphs of this section seem redundant once the next section is
considered.

[RFC7626] and to quote Stephane from his email of 9 Jan:

“It is indeed an important tenet of the draft (as it was for RFC 7626).”



#
# 3.2.  Data in the DNS Request
#

This text seems purely speculative:

> Another important thing about the privacy of the QNAME is the future
> usages.  Today, the lack of privacy is an obstacle to putting
> potentially sensitive or personally identifiable data in the DNS.  At
> the moment, your DNS traffic might reveal that you are doing email
> but not with whom.  If your Mail User Agent (MUA) starts looking up
> Pretty Good Privacy (PGP) keys in the DNS [RFC7929], then privacy
> becomes a lot more important.  And email is just an example; there
> would be other really interesting uses for a more privacy-friendly
> DNS.

[RFC7626]  


Next:

> For the communication between the stub resolver and the recursive
> resolver, the source IP address is the address of the user's machine.
> Therefore, all the issues and warnings about collection of IP addresses
> apply here. 

This text doesn't seem to be quite correct. 

[RFC7626] In what way? Recursive resolvers see both the queries and IP addresses. 


> In both cases, the IP address is as sensitive as it is for HTTP
> [sidn-entrada].

Is this true? I am not sure--the statement seems so vague it's hard to
evaluate.

[RFC7626] It means that for the last two case described in this paragraph the IP address exposed to the authoritative resolver is the originating clients IP, not a recursive resolvers' address and so it leaks information in the same was as the source address in an HTTP connection.

Suggest “s/the IP address/the IP address originating queries to the authoritative server/"


#
# 3.2.1.  Data in the DNS payload
#

> There are anecdotal accounts of MAC addresses [1] and even user names
> being inserted in non-standard EDNS(0) options

This is missing citations (say for RFC 6891), but it's also just anecdotal and
should be struck.

The discovery of this practice was an important finding within the DNS community to realise that client identifies were being used in this way, I don’t see a problem with this in an Informational RFC. I do note that the reference is incorrect - will replace with the correct one:

Will add the missing reference for RFC6891.


#
# 3.3.  Cache Snooping
#

> Since this also is a reconnaissance technique for subsequent cache poisoning
> attacks, some counter measures have already been developed and deployed.

There should be citations here, otherwise strike this text.



#
# 3.4.  On the Wire
#
# 3.4.1.  Unencrypted Transports
#

> For unencrypted transports, DNS traffic can be seen by an
> eavesdropper like any other traffic. 

This text seems like it should be largely deleted. It's redundant given the
text "almost all DNS queries are sent over UDP..." in the Introduction. Or,
move the text from the Introduction to this section.

[RFC7626] It is one sentence to provide context, I don’t believe the repetition is detrimental to the readability of the document. 


> An important specificity of the DNS traffic is that it may take a
> different path than the communication between the initiator and the
> recipient.

I think this text in describing anything specific to DNS--it's just one client
and two servers.

I assume you mean ‘don’t think’?

[RFC7626] I’m not sure you have understood the point. The initiator and recipient here are those involved in an HTTP communication. It is correct that the DNS traffic may take a different path via one or more resolvers, forwarders, proxies, validators and authoritative servers. Indeed it is often true to say that there are more than one client and two servers in the full resolution path. Also, a DNS “client" (stubs, resolvers, forwarders, proxies or validators) may send multiple queries for the same or similar (A and AAAA) information at the same time to multiple upstreams over different paths.


> The best place to tap, from an eavesdropper's point of view, is
> clearly between the stub resolvers and the recursive resolvers,
> because traffic is not limited by DNS caching.

Not sure this is true--clearly just monitoring a busy fiber optic cable has
been an attractive tactic.

[RFC7626] But if your goal is to eavesdrop on DNS traffic from end users specifically, then this is true. 


#
# 3.4.2.  Encrypted Transports
#

> These issues are not specific to DNS, but DNS traffic is susceptible to
> these attacks when using specific transports.

Yes. Just put HTTPS in this section.

The section covers both DoT and DoH. I don’t understand your suggestion. 


> More specifically, (since the deployment of encrypted transports is
> not widespread at the time of writing) users wishing to use encrypted
> transports for DNS may in practice be limited in the resolver
> services available.  Given this, the choice of a user to configure a
> single resolver (or a fixed set of resolvers) and an encrypted
> transport to use in all network environments can actually serve to
> identify the user as one that desires privacy and can provide an
> added mechanism to track them as they move across network environments.

This paragraph doesn't seem true, and will certainly become less so over time.
Is there a citation?

As mentioned in another thread I have worked directly with several early adopters who configured their systems to use their own DoT resolver. 

In terms of users configuring specific resolvers there are good privacy reasons to use as few resolvers as possible for resolution (whilst not be subject to a single point of failure) so this seems a genuine issue with DNS.


> Default configuration options for encrypted transports could in principle
> fingerprint a specific client application.

Is this text describing anything outside fingerprinting in general? Suggest
striking.

> If libraries or applications offer user configuration of such options
> (e.g.  [getdns]) 

Again, this text describes fingerprinting in general. The [getdns] reference
doesn't seem necessary, either. (It's a link to software by one of the
authors... not sure that should be ok in a consensus document).

For both these points the relevance is that new implementations of encrypted DNS are emerging and therefore add concerns about fingerprinting to DNS that didn’t exist when only UDP was used. As for the reference, this is an Informational RFC - as long as this reference is relevant there is no reason not to include it. A previous comment asked if there were examples of DNS implementations that allow configuration of these options so this reference was added. 


> Whilst there are known attacks on older versions of TLS the most
> recent recommendations [RFC7525] and the development of TLS 1.3
> [RFC8446] largely mitigate those.

This text doesn't seem specific enough to be helpful--readers should just read
up on TLS.

RFC7858 specifies DNS-over-TLS should use ’TLS 1.2 or later’ so giving context seems useful. 


> Traffic analysis of unpadded encrypted traffic is also possible
> [pitfalls-of-dns-encryption]

This seems like another general TLS/fingerprinting issue not specific to DNS,
and it also seems to be from 2014.

The paper cited is a well-known analysis titled “Pretty Bad Privacy: Pitfalls of DNS Encryption”. One of the things it shows is that even if traffic _to_ a recursive resolver is encrypted, if the traffic is not padded it can be combined with capture of unencrypted upstream traffic to infer the content of the encrypted query. It was a motivating factor in developing both RFC7830 and RFC8467.


#
# 3.5.1.  In the Recursive Resolvers
#

> Recursive Resolvers see all the traffic since there is typically no
> caching before them.

This isn't true.

[RFC7626] This is is true but I agree it could be better worded. Suggest: "In the common case of end users sending DNS queries (encrypted or not) to a recursive resolver, then that resolver is highly likely to see all the traffic from the users because there is typically little or no caching between stubs and recursive resolvers."


#
# 3.5.1.1.  Resolver Selection
#

> In general, as with many other protocols, issues around
> centralization also arise with DNS.  The picture is fluid with
> several competing factors contributing which can also vary by
> geographic region.

This is not a privacy concern, but rather a more vague one about
centralization.

As many other comments have noted, privacy and centralisation are linked concerns. 


> An increased proportion of the global DNS resolution traffic being
> served by only a few entities means that the privacy considerations
> for end users are highly dependent on the privacy policies and
> practices of those entities

This is misleading. Centralized DNS only cuts off current entities /if/ it's
encrypted. Otherwise, everyone on the wire can collect the data, making the
"privacy considerations for end users" dependent on several more parties.

suggest: “s/for end users are highly dependent /for end users are additionally highly dependent/"


> Many of the issues around centralization are discussed in
> [centralisation-and-data-sovereignty].

I am not sure this 2012 paper is relevant here.

If you feel strongly please suggest an alternative reference. 


#
# 3.5.1.1.1.  Dynamic Discovery of DoH and Strict DoT
#

> At the time of writing, efforts to provide standardized signaling mechanisms
> to discover the services

It's not clear that this is possible in a meaningful way. Suggest striking, as
the text is speculative.

The document outlines a fundamental limitation of the current use of encrypted transports. Highlighting active work (i.e. work that has been adopted by an IETF working group) seems useful.


> Note that an increasing numbers of ISPs are deploying encrypted DNS and
> publishing DNS privacy polices, for example see the Encrypted DNS Deployment
> Initiative [EDDI].

The EDDI list seems ok, but I'm not sure it warrants a reference here.
Certainly they do not focus on "publishing DNS privacy polices", though.

Suggest: “Note that an increasing numbers of ISPs are deploying encrypted DNS, for example, see the Encrypted DNS Deployment Initiative [EDDI]."


#
# 3.5.1.1.2.  Application-specific Resolver Selection
#

This section seems to contain no non-speculative information. Suggest
striking. Maybe the most direct criticism is that the entire section also
applies to Operating Systems.

This overlaps with the similar discussion with Ekr on OS vs Application. Note that TentaDNS, Yandex and Firefox and others all support application-specific DNS resolver selection today. The document previously had reference to some of these but they were removed during the most recent round of reviews. 


#
# 3.5.1.2.  Active Attacks on Resolver Configuration
#

This section describes why the section on "Dynamic Discovery" is misguided.
However, I believe it describes general security and privacy concerns that are
not specific to DNS, and should be struck. For example:

> In addition, if the client is compromised, the attacker can replace the DNS
> configuration with one of its own choosing.

An equivalent section existed in RFC7626. A reference to dnswasher (a DNS specific case of compromising a client) was previously in this section but was removed in the previous rounds of review. 


#
# 3.5.1.3.  Blocking of User Selected DNS Resolution Services
#

This section doesn't have a clear point. 

> The extent of the risk to end user privacy is highly dependent on the
> specific network and user context...

The section covers blocking of the user, by the user, of the network, by the
network, and DDoS? What does this have to do with privacy? For example, what
does the text about RFC7754 have to do with privacy?

This section states: “User privacy can also be at risk if there is blocking (by local
   network operators or more general mechanisms) of access to remote
   recursive servers that offer encrypted transports when the local
   resolver does not offer encryption and/or has very poor privacy
   policies."



#
# 3.5.1.4.  Encrypted Transports and Recursive Resolvers
#
# 3.5.1.4.1.  DoT and DoH
#

> Use of encrypted transports does not reduce the data available in the
> recursive resolver and ironically can actually expose more
> information about users to operators.  As described in Section 3.4.2
> use of session based encrypted transports (TCP/TLS) can expose
> correlation data about users.

I don't think this is correct in a general enough way to be written like this.

This is correct when comparing the data exposed to a resolver via UDP and TCP.


#
# 3.5.1.4.2.  DoH Specific Considerations
#

> DoH inherits the full privacy properties of the HTTPS stack and as a
> consequence introduces new privacy considerations when compared with
> DNS over UDP, TCP or TLS [RFC7858].

Suggest:

"DoH inherits the full privacy properties of the HTTPS stack. There are
additional metadata locations to consider in comparison to DNS over UDP, TCP 
or TLS [RFC7858]”.

Reasoning: any data in an HTTP header can also be placed in a TLS extension.

The more general point here is that the HTTP standard introduces new metadata (not just locations) that does not exist in the standards for the other transports.


> HTTPS presents new considerations for correlation...

No, these are all present in TLS as well, but it's true that HTTP headers are
a new metadata location (as I suggest above).

> The User-Agent and Accept-Language request header fields

Not clear this is going to be an issue. Of course, an application could
include some extremely damaging metadata in an HTTP header, but this is also
possible in a TLS extension.

> Utilizing the full set of HTTP features enables DoH to be more than an 
> HTTP tunnel

Not clear what this means.

> Implementations are advised to expose the minimal set of data needed to
> achieve the desired feature set

This seems like general advice not specific to HTTPS or DoH.

All the above text is lifted directly from RFC8484.  


> At the extremes, there may be implementations that attempt to achieve parity
> with DoT

Not a justified use of "parity”.

Why not? DoH is effectively a delta on top of DoT.


> Some implementations have, in fact, chosen restrict the use of the
> 'User-Agent' header so that resolver operators cannot identify the
> specific application that is originating the DNS queries

That wouldn't actually do the trick. TLS ClientHello messages are also easily
fingerprinted.

I can reference the Mozilla bug ticket where this is discussed if you like.
Also just because other risks might exist doesn’t mean we shouldn't document and try to mitigate all the risks.  


> Privacy focused users ...

Not specific to DoH.

The rest of this sentence makes a specific comparison between DoT and DoH.


#
# 3.5.2.  In the Authoritative Name Servers
#

> Also, the end user typically has some legal/contractual link with the
> recursive resolver

Not sure this is true.

[RFC7626] s/typically/often/ and the text goes on to give specific examples.


In general, I'm not sure this section provides much useful analysis, but this:

> With the control (or the ability to sniff the traffic) of a few name
> servers, you can gather a lot of information.

seems to conflict with other parts of the document that claim encryption leads
to more identifying data. Both can be true, but the document does not state
the problem in one place and balance the concerns well.

Suggest: “ you can gather a lot of information including the content of DNS queries."


Regards

Sara. 
-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux