Re: Proposed Statement on "HTTPS everywhere for the IETF"

"Roy T. Fielding" <fielding@xxxxxxxx> · Tue, 9 Jun 2015 17:24:08 -0700

> On Jun 5, 2015, at 2:05 AM, Mark Nottingham <mnot@xxxxxxxx> wrote:
> 
> Hi Roy,
> 
>>>> My overall concern here is that statements like this undermine the integrity of the organization. I understand people wanting to improve overall privacy, but this step does not do that in any meaningful way.
>> 
>>> 
>> 
>>> Encrypting the channel does provide some small amount of privacy for the *request*, which is not public information.  Browser capabilities, cookies, etc. benefit from not being easily-correlated with other information.
>> 
>> That is message confidentiality, not privacy.  Almost all of the privacy bits (as in, which person is doing what and where) are revealed outside of the message.
> 
> There's been a lot of historic confusion (or maybe it's just different jargon for different communities) about confidentiality and privacy in protocols; I'm assuming Joe meant, roughly "…provide a small amount of privacy by making the request confidential…"

I assumed so too, but there is a huge difference between privacy and just hiding
some of the header field content and query data in each request.
The problem with this campaign is that it is using the term "privacy" as if HTTPS
provides it, but it does nothing of the sort.

For most people, we let such confusion slide because they aren't expected to
understand how the protocols work as a system.  The IESG should at least understand
the protocols if it is going to be conducting campaigns in our name.

>>> It would be interesting to define an HTTP header of "Padding" into which the client would put some random noise to pad the request to a well-known size, in order to make traffic analysis of the request slightly more difficult.  This is the sort of thing that comes up when we talk about doing more encryption for the IETF's data, which shows the IESG's suggested approach to be completely rational.
> 
> 
> HTTP/2 has padding built into the relevant frames. In HTTP/1.x, padding is sometimes done with (unregistered) headers, but more often is done with chunk-extensions. Don't think anything needs to be registered here.
> 
> 
>> Browsers don't send singular messages containing anonymous information.  They send a complex
>> sequence of messages to multiple parties with an interaction pattern and communication state.
>> The more complex and encrypted the communication, the more uncommon state and direct
>> communication is required, which makes it easier to track a person across multiple requests
>> until the user's identity is revealed.
> 
> +1. I'm very interested to see the research that showed this so clearly for HTTP/1.x over TLS repeated for HTTP/2, since it has multiplexing and usually uses a single connection per origin. I suspect that it's better, but certainly not proof against these kids of attacks.
> 
> 
>> Furthermore, with TLS in place, it becomes easy and commonplace to send stored authentication credentials in those requests, without visibility, and without the ability to easily reset those credentials (unlike in-the-clear cookies).
> 
> Yes. This is a concern that I talked through with Balachander Krishnamurthy (who said his cookie research would have been much more difficult with pervasive HTTPS) and others when SPDY came around. I think we need much better tooling here. There has been a bit of progress, but it's been very slow...

I don't think you appreciate the impact of authenticated requests on the overall system.
It isn't just that the sites you intend to visit now have the ability to uniquely identify
you at no additional infrastructure cost.  It is that every https reference on every page
has the same ability, and is no longer hindered by limitations on Referer or "privacy"
concerns (again, because people like the IETF claim that encrypted data sent over TLS is
private even when we have no control over the CAs, the recipient, and the data sent).

>> Padding has very little effect.  It isn't just the message sizes that change -- it is all of the behavior that changes, and all of the references to that behavior in subsequent requests, and the effects of those changes on both the server and the client.
> 
> Padding may not be sufficient to be proof against information leakage, but it is sometimes necessary. It may have little effect in the scenarios you're thinking of, but it's still useful against some attacks.
> 
> 
>> TLS does not provide privacy.
> 
> No protocol "provides" privacy in the sense you're talking about. TLS helps to maintain privacy in certain scenarios.

Yes, but not the scenario described by an Internet retrieval of an "https" schemed resource
identifying public information that does not require user authentication or persistent
cookies to GET.  That is the added scope of what people mean by HTTPS-everywhere, since HTTPS
itself is not a named protocol and we already recommend whatever is HTTPS-obvious.

> Given the news over the last two years (to almost the day!) and the nature of the attacks we're talking about (where your access to public information can be strung together to learn many things about you) it's not surprising that it's being discussed.

Yes, but again -- using a significant event like Snowden's release of information about
mass surveillance to justify HTTPS-everywhere presumes that HTTPS-everywhere is an actual
defense against mass surveillance, or at least enough of an improvement to justify its cost.
While confidentiality is necessary in many cases, and more than justified by those cases,
it is not necessary in all cases.  Furthermore, a user's privacy can be reduced by insisting
that HTTPS be used in all cases, because "https" hides what each page decides to send over
the connection, increases the amount of metadata pointing directly at the user agent, and
extends the duration of exchanges.

Encryption works. That does NOT mean that performing Web retrievals using TLS hides the
information necessary to track exactly who you are, what you are doing, and how long you
are doing it.  It can hide other things: things that have been considered important to hide long
before mass surveillance became a rallying cry (as odd as that sounds).  Nor does it mean
that, when encryption is useful, TLS is the right protocol to apply it.

Avoiding mass surveillance is a lot harder.  It requires specialized behavior by the
user agent, not just encrypting communication.  It requires better protocols for
name services, routing, and avoidance of long-lived connections.  These are also within
the scope of the IETF.  But what we are being told, instead, is that "https" will somehow
address the problem if we all click our heels together at the same time.  It's a disgrace.

The problem isn't that we lack the ability to combat mass surveillance.

> Using more TLS to achieve confidentiality *will* result in more privacy from a pervasive network attacker — it just won't help against an attacker (even with the best of intentions or the dodgiest of business models) at the other end of that connection (which I absolutely agree that the IETF and W3C should be thinking about as well).

A pervasive network attacker is at both ends of the connection, and behind the connection,
and watching state before and after the connection.  Pervasive is pervasive.

>> What it does is disable anonymous access to ensure authority.
> 
> Please explain?

The https scheme relies on the notion of authority in the URI combined with direct or
tunneled connection to that authority to establish a trusted exchange of information
between the user and that authority (assuming that the user trusts that authority).
For various performance reasons, a great deal of state is held on the user agent to
ensure that its next connection to the same authority isn't depressingly slow.
Recipients are discouraged from shared caching or mirroring of the content, since
the authority is vested only in the connection that delivered it, not in the bits
that were delivered, and the user agent doesn't know why the bits were secured.

Anonymous access, in contrast, does not presume that the user trusts the authority.
Very little state is maintained on the user agent, since it doesn't actually help.
Recipients are encouraged to cache or mirror the content, especially if the
content itself is signed, which means other users can access the content without
making a request to the authority.  Information can be replicated and accessed at
locations the user does trust, perhaps even offline.

The other advantage that replication has over https, aside from not requiring
a connection to the authority, is that the information cannot be personalized.
If you can go to a public library to see a copy of the tax code, or legal code,
or some other document of public interest, it makes it much harder for that code
to be changed without people noticing, or for certain viewers of the code to see
a different version than others.

>> It changes access patterns away from decentralized caching to more centralized authority control.
> 
> I think the combination of how HTTP is defined and Web browsers' specific usage patterns of HTTP over TLS does that. We're already seeing some background discussion of how to offer caching without sacrificing security. 

We can't have a reasonable comparison of the effect of HTTPS-everywhere based on
proposals that are deployed nowhere.  Deploy them first, advocate later.

>> That is the opposite of privacy.
> 
> No, it's the opposite of anonymity. The most relevant definition of privacy I've seen was brought up on the Human Rights mailing list a little while back:
>  <http://www.internetsociety.org/blog/2013/12/language-privacy>
> … and it's much more nuanced than that.

Of course it is more nuanced than that, but I certainly won't be looking at a
definition of "about privacy" to define lack of privacy (they are different things).
My point was that forcing people into an interaction pattern involving the
authority of a given set of information, for every bit of information that
person might want to access, does not preserve the user's privacy.

> That said, I agree that both forced de-anonymisation and centralisation *can* both be privacy-hostile. I don't think it follows that more TLS / HTTPS equals less privacy, however.

HTTPS where it isn't needed results in a more centralized system, with less privacy
for anyone participating in that system.  This is a frequently repeated pattern that
can be observed right now in any of the walled gardens.

>> TLS is desirable for access to account-based services wherein anonymity is not a concern (and usually not even allowed).  TLS is NOT desirable for access to public information, except in that it provides an ephemeral form of message integrity that is a weak replacement for content integrity.
> 
> I think reasonable people can disagree here. When faced with a pervasive attacker (whether it be a government or a network provider) who can use your access to public information against your will, it *is* desirable. 

Sorry, https does not help there.  TCP and state observation is more than sufficient.
TLS does help when it is used in a completely different way (securing connections to
trusted privacy-filtering and re-routing intermediaries, for example).

> As an aside, the World Economic Forum has classified personal data — presumably including browsing habits — as a "new asset class." One could argue that by browsing without encryption, you're literally giving money to anyone on the path who wishes to extract it.

The same is true for browsing with encryption.  Furthermore, if everything is encrypted,
then the presence of encryption alone no longer implies special handling of the data is
warranted.

>> If the IETF wants to improve privacy, it should work on protocols that provide anonymous
>> access to signed artifacts (authentication of the content, not the connection) that is
>> independent of the user's access mechanism.
> 
> Could you expand upon this a bit? I can think of many potential projects along these lines (and even have one or two brewing), but I'm not quite sure what you're getting at.

See above, or just look at reasonably good systems that are actually
designed to protect privacy (like Tor).

>> I have no objection to the IESG proposal to provide information *also* via https.  It would be better to provide content signatures and encourage mirroring, just to be a good example,
>> but I don't expect eggs to show up before chickens.  However, I agree with Tony's assessment: most of the text is nothing more than a pompous political statement, much like the sham of "consensus" that was contrived at the Vancouver IETF. TLS everywhere is great for large companies with a financial stake in Internet centralization. It is even better for those providing identity services and TLS-outsourcing via CDNs.
> 
> *sigh* I'm always disappointed when people smear others' motivations without facts to back it up.

I am disappointed with engineers who think it is appropriate to arrange an array
of joint meetings with the TLS working group wherein hums are conducted to contrive
a political statement that is later claimed to represent IETF consensus, as opposed
to the repeated consensus of the one working group which happens to develop TLS.

I am also disappointed that, once that self-serving political statement was arranged,
it has been used repeatedly to mislead other organizations that are less savvy about
how the Internet actually works, but wouldn't dream of opposing "privacy".  Who would?

If you are going to conduct a political campaign, I expect to see reasonable and
responsible disclosures of the profit motive even if it has no personal relevance
to the person disclosing.  Readers can reach their own conclusions.

It is a fact that https is considerably harder to scale than http hosted services,
in terms of CPU, bandwidth, congestion-sensitivity, cache effectiveness, and
longevity of connections.  It is a fact that few companies specifically sell such
services to others and would have difficulty not benefiting from more customers.
It is a fact that https services are currently sold at a premium, as compared to
http services, so even existing customers making the switch will inevitably result
in increased revenue.  And it is a fact that there are far fewer organizations with
sufficient competence to do it right, which will (at least temporarily) create
a competitive advantage.

Likewise, various publications exist that describe efforts to create
persistent advertising identifiers that are not subject to cookie clearing.
Specifically, identifiers tied to services that the user would not want to discard.
The problem with such identifiers is that sending them in the clear, even in a
hashed form, would display an obvious user trail.  If everyone is using https,
they don't need to be sent in the clear any more; however, the trail is still
there (and fully within the reach of pervasive surveillance).
The IETF should not call that "privacy".

Furthermore, when search engines switched to https (rightly so) to preserve
the confidentiality of user-provided query data, they lost the ability to
pass that query information as Referer data to downstream non-https sites.
That's no longer a problem in the land of https-everywhere.

> E.g., from what I've seen of CDNs (take my personal view as you will), it's not the nirvana you paint; scaling TLS is still difficult (thanks to lingering lack of support for SNI), and there's a growing expectation in the market that HTTPS will cost the same as or only a small increment over serving HTTP (despite the consumption of IP addresses that it requires to serve a broad set of clients well from a highly distributed set of servers).

I only get to hear the complaints of actual enterprise customers.  YMMV.

>> It's a shame that the IETF has been abused in this way to promote a campaign that will
>> effectively end anonymous access, under the guise of promoting privacy.
> 
> How does HTTPS "end anonymous access"?

Because https-everywhere eliminates anonymous access; not just in the technical
leaks that result from all that authenticating the authority, but also in the social
effects it has on the overall ecosystem.  It excludes the features of HTTP
that encouraged shared caching (by default) and removes social and technical
barriers associated with persistently identifying each user.

If we are going to make grand recommendations that change the way the Web works,
we should at least understand the consequences.  If we are going to tell people
that something will improve privacy, then it had better improve privacy to the
same degree that we say it does.

....Roy