[Last-Call] Artart last call review of draft-ietf-6man-rfc6874bis-02

Martin Thomson via Datatracker <noreply@xxxxxxxx> · Fri, 16 Sep 2022 12:31:50 -0700

Reviewer: Martin Thomson
Review result: Not Ready

As a bit of a preface here, I've been aware of this work for some time, but
have refrained from offering opinion.  As liaison to the W3C, I felt like my
responsibility was to facilitate the conversation.  However, as Barry has asked
me to do a review, I feel obligated to set that attempt at neutrality aside.

The biggest issue here is that this document is very much unwanted by its
target audience, as is its antecedent, RFC 6874.  I share that concern.

I have clear, strong indications from folks at three major browsers
(conveniently, I am at W3C TPAC this week and so was able to talk with a few
people directly on this topic; inconveniently, I've not had a lot of time for
this review) that this change to the URI specification is not just something
that they don't want to implement, but that it is not good for the Web.  Public
communications from them will be somewhat more polite and circumspect, but it
has been made clear to me that this change is not wanted.

The IETF does occasionally publish specifications that don't end up being
implemented, but we usually look for signals that a protocol might be
implemented before even starting work.  Here, we have a strong signal that a
specification won't be implemented.  Mark Nottingham asks the same question as
well as point 1 in [1].  (I don't personally find his second and third points
to be especially problematic given adequate consultation, but even there, there
are a few concerns that I will outline below.)

I do want to give due credit to the authors - Brian in particular - for being
very open and forthright in their consultation with the affected constituency. 
They have been proactive and responsive in a nearly exemplary fashion.

Overall, I think that it would be better for the IETF to declare RFC 6874 as
Historic(al).  There might be some residual value in RFC 4007 from a diagnostic
perspective, but the use of zone identifiers in URIs seems fundamentally
incompatible with the goals of URIs.

I do recognize that the Web and HTTP is not the only protocol affected by this
sort of change.  The goal is to change all URI types.  However, I believe that
HTTP is pretty important here and I have a fair sense that the sort of concerns
I raise with respect to HTTP apply (or should apply) to other schemes.

---

There are a few technical concerns I have based on reviewing the draft.  Some
of these - on their own - are significant enough to justify not publishing this
document.

Inclusion of purely local information in the *universal* identity of a resource
runs directly counter to the point of having a URI.  This creates some very
difficult questions that the draft does not address.

For instance (1), the Web security model depends on having a clear definition
for the origin of resources.  The definition of Origin depends on the
representation of the hostname and it relies heavily both on uniqueness
(something a zone ID potentially contributes toward) and consistency across
contexts (which a zone ID works directly against).  Now, arguably the identity
of resources that are accessed by link-local URIs don't need and cannot
guarantee either property, but this is an example of the sorts of problem that
needs to be dealt with when local information is added to a component that is
critical to web security.

For instance (2), in HTTP and several other protocols, servers depends on the
host component - as it appears in the URI - to determine authority.  If there
is no rule for stripping the zone ID from URIs, servers hostname checks will
depend on the client.  That exposes link-local servers to information that they
need to filter out.  Some might not be prepared to do that.  Hostname checks
are critical for security, especially the consistent treatment of the field
across different components like serving infrastructure, web application
firewalls, access control modules, and other components.

This is a non-backwards-compatible change to RFC 3986.  The only issue related
to this that is addressed in the draft is the question of document management -
this updates RFC 3986 - but surely there are other concerns that might need to
be addressed.  I see some effort to address software backwards-compatibility in
discussion threads, but I found very little in the draft itself.

The configuration of zones on a machine is could be private information, but
this information is being broadcast to servers.  In HTTP, that is in Host
header fields; on the Web, in document.location.  This information might
contribute significant amounts of information toward a fingerprint.  I
appreciate that the stripping of zone ID was never implemented, but it is a
useful feature.

Arguments in Section 5 depend on the zone IDs being hard to guess, but that
isn't true.  Zone IDs are - in practice - low entropy fields.  More critically,
they are fields that are sent to servers.

Zone ID size is not bounded - most implementations will have a size limit on
the authority or host portion of a URI (256 octets is sufficient for current
names), but the implication is that Zone IDs could be arbitrary length.

Though percent-decoding is not likely to be a concern from a specification
perspective (the operative specification from the browser perspective does not
apply pct-decoding to a v6 address [2]), what work has been done to verify that
a zone ID won't break existing software?

[1] https://mailarchive.ietf.org/arch/msg/last-call/4vEKZosvKvqJ9cufSm5ivsCho_A/
[2] https://url.spec.whatwg.org/#concept-host-parser

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call