Re: [Last-Call] [art] Artart last call review of draft-ietf-6man-rfc6874bis-02

Brian E Carpenter <brian.e.carpenter@xxxxxxxxx> · Tue, 22 Nov 2022 10:09:46 +1300

A slight update on curl.

On Windows, it simply ignores the %xxx and uses the default zone.

On Linux, it *requires* the %xxx because Linux has no default zone. So actually it's Linux curl that already supports rfc6874bis completely, and Windows curl is weak.

Linux curl *also* supports the RFC6874 notation %25xxx, which creates an issue if one names a zone as "25".

Regards
   Brian Carpenter

On 22-Nov-22 09:07, Brian E Carpenter wrote:
Hi Martin,

Please see a few comments in line:

On 21-Nov-22 17:48, Martin J. Dürst wrote:
Sorry to be late with this answer, but there are a few points below than
need to be corrected.

On 2022-10-05 12:02, Brian E Carpenter wrote:
Hi Dale,

Note that since the Last Call has ended, there's now a -03 drfat
that attempts to respond to the actionable comments from various
reviews.
(https://datatracker.ietf.org/doc/draft-ietf-6man-rfc6874bis/)

More in line:

On 05-Oct-22 13:51, Dale R. Worley wrote:
I'm not an expert in this area, but it seems that these points can be
made:

2. It doesn't seem to be a philosophical problem that we define a type
of URI that can only be properly interpreted within a very small part of
the Internet.

This is definitely correct. There's no requirement that an URI has to be
dereferenceable everywhere. But while there's no philosophical problem
with that, it seems quite strange to change a very fundamental part of
URIs (percent escaping) for such a small use case.

There's no requirement that URIs be universally
interpretable; "U" stands for "uniform" as in uniform syntax.

Yes. And part of that uniform syntax is the uniformity of percent
escaping, which this proposal squarely ignores.

I believe that is not correct. I agree that there is some subtley in
the description of percent encoding, and this is one of the reasons
that RFC6874 took a different (and wrong) approach. We assumed that
this sentence in RFC3986:

"2.1.  Percent-Encoding

     A percent-encoding mechanism is used to represent a data octet in a
     component when that octet's corresponding character is outside the
     allowed set or is being used as a delimiter of, or within, the
     component."

would apply to IP-literal, but it doesn't, because the proposed BNF
says it doesn't. Then we have:

"2.4.  When to Encode or Decode

     Under normal circumstances, the only time when octets within a URI
     are percent-encoded is during the process of producing the URI from
     its component parts.  This is when an implementation determines which
     of the reserved characters are to be used as subcomponent delimiters
     and which can be safely used as data."

An implementation (which in this case is usually a human!) has no reason
to determine that "fe80::a%eth0" needs percent-encoding, because the
proposed BNF says it doesn't. In that terminology, the "%" is
acting as a subcomponent delimiter, not as data, so it doesn't need
encoding.

(It was Andrew Cady who first clearly pointed this out last year:
https://mailarchive.ietf.org/arch/msg/ipv6/ocNXw2Tl7YnOXOVjnUJ_VS7PI88 )

In fact, neither our interactions with browser implementors, nor my brief
experience patching wget, have shown up any problems with this.

Incidentally, I just thought to try this command on my Windows box:

C:\WINDOWS\system32>curl http://[fe80::2e3a:fdff:fea4:cce7%7]

(That's the link-local address of my Fritz Box, slightly obfuscated.)

And guess what, it replied:

<!DOCTYPE html>
<html lang="en">
<head>
...
</script>
</body>
</html>
Thus, curl on Windows 10 already supports draft-ietf-6man-rfc6874bis,
and the Fritz Box web server seems happy with it.

Or for
that matter, that they might be interpreted differently in different
places.  There is vast elasticity regarding what it means to "identify"
a "resource".  (I've been involved in a working group that defined URNs
that were abstract properties, and would only be realized by comparing a
prioritized sequence of URNs against the signals that a device was
capable of producing.)

We agree. The -03 draft makes this point.

3. Given #2, it's not a problem that many implementations would be
unable to parse these URLs because their syntax is not
upward-compatible, as long as the beneficial use cases are generally
implemented.

It depends on what "unable to parse" means. Many parsers and other
software are written so that edge cases and errors get processed just
'somehow', possibly producing unexpected results.

True. But this doesn't matter in practice, as the -05 draft explains at
https://www.ietf.org/archive/id/draft-ietf-6man-rfc6874bis-05.html#name-scope-and-deployment

5. There's a significant amount of trouble because RFC 4007 chose "%" as
the delimiter for zone indexes but "%" has a special syntax in URLs.  In
principle, this shouldn't be a problem.  "%" is used as the first
character of "%xx" escapes, but within URLs, that's just a constraint on
the contexts in which "%" may be used.  Unfortunately, many people are
sloppy and e.g. consider the URL "http://example.com/foo-bar"; to be
equivalent to "http://example.com/foo%2dbar";, which leads to a lot of
software attempting to "normalize" URIs that contain "%".  But the fact
that such software would choke on URLs containing zone indexes doesn't
seem to be important, as we expect zone indexes to have limited use.

Such software isn't sloppy at all, it follows RFC 3986. Please see in
particular Section 2.4
(https://www.rfc-editor.org/rfc/rfc3986.html#section-2.4).
"http://example.com/foo-bar"; and "http://example.com/foo%2dbar"; are
equivalent. See also Section 6.2.2
(https://www.rfc-editor.org/rfc/rfc3986.html#section-6.2.2).

I discussed 2.4 above. I don't see 6.2.2 as relevant to the various use
cases.

Correct. That's exactly why the necessary patch to wget is two lines of C.
(https://github.com/becarpenter/wget6/blob/main/wget-6874bis.md)
It's *significantly* harder for the browsers, since their parsers are much
more complex than wget, but your analysis seems to be spot on.

The unfortunate circumstance is that RFC 4007 has pretty much frozen "%"
as the delimiter character.  If we could change that, life would be
easier.  But there's a lot of deployed software and current practice
that would have to be changed.

Exactly. It's unfortunate, but at the time of RFC4007, nobody noticed
this gotcha.

The gotcha can't be fixed anymore. But for those who created it, it
might at least be possible to acknowledge it and compromise in a greater
context. If all the Windows users who are accustomed to '\' as a path
separator can change that to '/' in URIs, why is it so difficult for the
very rare and localized case of zone ids to find another character than '%'?

That was discussed, actually prior to RFC6874, and the consensus in 6MAN
was pretty clear - people want cut-and-paste, which means accepting "%".
Today that is even harder to change than it was a few years ago.

Thanks
      Brian

Regards,   Martin.

6. To get full usage of the new syntax, both the browsers and servers
that would be accessed by link-local addresses need to be changed.  In
practice, the browsers are likely to be general-purpose but the servers
are likely to be resident on a small subset of devices that are
self-consciously network devices.

That's correct. As now noted in the draft, in some use cases even an
HTTP error response is a fine result for diagnostic purposes, because
it confirms connectivity.

Regards
       Brian

--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call