Hashing local-parts of addresses (was: dane-openpgp 2nd LC resolution)

ned+ietf@xxxxxxxxxxxxxxxxx · Sun, 20 Mar 2016 09:48:42 -0700 (PDT)

(This is the first in what will hopefully be a series of review comments on the
latest version of the dane-openpgp specification. I'm breaking this up into
several different topics in hopes of keeping any resulting discussion focused
on the particular set of issues I've brought up.)

This is the first specification I'm aware that hashes the local-part of an
address to produce a corresponding identifier. Not only have we never gone this
far before, we've actually tried to stay away from operations like address
comparisons that have similar, albeit more limited, semantics.

In regards to this operation, there has been extensive discussion of the
longstanding requirement that only agents with administative authority over the
associated domain can "interpret" the local-part of an address.

Unfortunately, AFAICT this discussion has completely missed two fundamental and
vitally important points.

First, there's no way to define a mapping of local-parts to a new set of
identifiers *without* effectively interpreting the local-part! If you define
the mapping as the draft currently does, implicit in that definition is that
local-parts are case-sensitive. And similarly, if you convert the local-part to
lower (or upper) case, you're now assuming the local-part is case-insensitive.

And in the case of EAI, without some sort of normalization you're assuming that
different UTF-8 representations of the same string of characters correspond to
different recipients. (Which, as Harald Alvestrand and I both pointed out on
the IETF list, is technically untenable and needs to be addressed. My
suggestion was and is to specify that the same case-folding and normalization
algorithm used for IDNs also be employed here.)

But - and this is the second fundamental point that AFAICT has been missed -
who is doing the interpreting? In one sense it's the consumer of the OPENPGPKEY
records in the DNS, and the discussion so far has focused on how such consumers
don't have the right to do that.

But who published those records? That would be the owner of the domain - you
know, the folks who *are* entitled to interpret the local-part of addresses in
whatever fashion they choose.

So when a domain owner publishes such records in the DNS, a reasonable way to
look at it is that they are effectively saying, "Everyone is allowed to
interpret the local-parts of our addresses as specified in this document in
this one narrow context." I'm pretty confident there's nothing in any standard
that forbids such a delegation of authority.

And once you realize this is what is going on, not only does it become clear
that this draft is *not* violating the longstanding rules about local-part
interpretation, it casts the decision not to normalize the local-parts to lower
(or upper) case in an entirely different light. By choosing not to normalize
this specification is effectively restricting its own applicability to domains
with case-sensitive local parts. That is, IMO, a highly suboptimal choice - the
overwhelming majority of domains treat the local part in a case-insensitive
fashion, and so should the mechanism specified in this draft.

Or, to put this another way, the inherent limitations of using the DNS to
provide the mapping from address to PGP key restricts the domain of
applicability of this specification to domains with particular local-part
policies, and the way in which the local-part to DNS mapping is specified
determines which policies the specification supports. And while it seems
logical to support a policy that's known to be in wide use, the specification
also needs to be very clear that domains that employ case-sensitive local-parts
MUST NOT avail themselves of this mechanism.

What needs to happen here is that the specification be revised to make it clear
that this is what is going on: That by publishing such records a domain is
granting a limited right to interpret the local parts of its addresses.

(One can of course argue that a specification that fails to offer a solution to
case-sensitive domains, or to domains that employ various forms of
subaddressing semantics, is unacceptable. But I am emphatically not making that
argument. I have a number of grave reservations about this draft that I am
going to try to explain in subsequent messages, but this isn't one of them.)

There's also - as noted by Sean Leonard - a technical glitch in the current
specification: The local-part is not the correct input to the hash function. A
canonicalization step is needed because all of these addresses are
equivalent:

(1) first.last@xxxxxxxxxxx
(2) first . last @example.com
(3) "first.last"@example.com
(4) "\f\i\r\s\t.last"@example.com

(2) is equivalent to (1) because CWS has no semantics, (3) is equivalent to
(1) because the enclosing quotes are not properly part of the address, and (4)
is equivalent to (1) because quoted-pairs are semantically equivalent to
just the quoted character.

I believe this is the entire list, so the obvious canonicalization to use
on the local-part portion of an address prior to lowercasing and hashing is:

(a) If the local-part is unquoted remove any whitespace around periods.
(b) Remove any enclosing double quotes.
(c) Remove any literal quoting.

I might be inclined to say that this rather technical matter can wait to be
resolved in a future update, but (1) Implementations once deployed are
difficult to change, and according to the draft there are already incompatible
implementations out there and (2) Normalization need to be revisited
anyhow, so why not fix this as well?

Finally, a couple of observations about terminology are in order. The current
text covering the hashing of local-parts begins with:

      The user name (the "left-hand side" of the email address, called
      the "local-part" in the mail message format definition [RFC5322]
      and the local-part in the specification for internationalized
      email [RFC6530]) is encoded in UTF-8 (or its subset ASCII).  If
      the local-part is written in another encoding it MUST be converted
      to UTF-8.

First, the left hand side of an email address is not a "user name" and should
not be referred to as such. (The entire address is in some cases a "user name"
of sorts, and in some cases the local-part is identical to some kind of login
credential. But neither of these are universally true, and more to the point,
none of this is relevant to the matter at hand.)

Second, it probably makes sense to note that local-part is an ABNF
production contained in a broader syntax, not just a name.

Third, the term "encoding" here is inaccurate; it should be charset.

That's all for now.

				Ned