Re: Last Call: draft-duerst-mailto-bis (The 'mailto' URI Scheme) to Proposed Standard

"Martin J. Dürst" <duerst@xxxxxxxxxxxxxxx> · Mon, 08 Mar 2010 19:09:32 +0900

Hello John,

Many thanks for your comments. I'm sorry my reply is late.

On 2010/01/22 0:58, John C Klensin wrote:

--On Monday, November 30, 2009 07:38 -0800 The IESG
<iesg-secretary@xxxxxxxx>  wrote:

The IESG has received a request from an individual submitter
to consider  the following document:

- 'The 'mailto' URI Scheme '
    <draft-duerst-mailto-bis-07.txt>  as a Proposed Standard

The IESG plans to make a decision in the next few weeks, and
solicits final comments on this action.  Please send
...

Hi.

I thought I had sent notes on this some weeks ago, but can find
no record of having done so,

Neither could I, sorry.

so apologize for the late
submission.

The mailto specification exposes several of the problems with
the interaction between the URI model and the syntactic and
semantic conventions of assorted protocols, especially
protocols that were specified and deployed long before there
was such a thing as a URI.   In this case, that situation is
complicated by the observations that mailto URIs are very
widely deployed, making backward compatibility important, and
that the existing specification in RFC 2368.

Because of the lateness of this review, I'm ignoring issues
that I don't consider especially significant.   I believe that
the following issues _are_ significant:

(1) Special characters, particularly "+", and percent-encoding

The specification talks about the need to encode various
special characters, particularly characters that have reserved
meanings in the URI specification such as "%" and "/".  One of
the failings in prior mailto specifications was that the state
of "+" was left ambiguous wrt whether it needed to be encoded
or not.  "+" is heavily used in subaddress techniques and,
partially because of the interactions noted in Section 5 of
this document, has caused a problematic interaction with the
use of the same character as an encoding for a blank space in
web forms.  The problem is noted and discussed in more detail
in RFC 3696.

Despite the discussion in the third paragraph of Section 5,
the document leaves ambiguous whether the correct
representation of an email address like john+ietf@xxxxxxxxxxx
in a mailto URI is

    mailto:john+ietf@xxxxxxxxxxx      or
    mailto:john%2Bietf@xxxxxxxxxxx

Both of these are correct. There is no real ambiguity, all characters 
not specially mentioned (including 'a'-'z',...) just stand for 
themselves, and '+' is part of this. And all such characters can be 
escaped, although that's not usually done (see below).

and whether either of those, if interpreted in a web form
context, is expected to be treated as

   john+ietf@xxxxxxxxxxx              or
   "john ietf"@example.com

The former.

both of which are valid addresses under RFC 5321 (see the
production for "qtextSMTP" there -- the
"john\ ietf"@example.com form is not required.

I have added a sentence about subaddresses, i.e. I have changed:

When producing 'mailto' URIs, all spaces SHOULD be encoded as %20.

to:
      When producing 'mailto' URIs, all spaces SHOULD be encoded as %20,
      and '+' characters MAY be encoded as %2B.
      Please note that '+' characters are frequently used as part of
      an email address to indicate a subaddress, as for example in
      <bill+ietf@xxxxxxxxxxx>.

I hope this helps.

It is also worth noting that, while

    mailto:joe@xxxxxxxxxxx           and
    mailto:joe%65@xxxxxxxxxxx

are considered equivalent in this specification,

No, not exactly.

      mailto:joe@xxxxxxxxxxx       and
      mailto:jo%65@xxxxxxxxxxx

are considered equivalent. Not just by this specification, but by RFC 
3986 to start with.

the email
addresses

    joe@xxxxxxxxxxx               and
    joe%65@xxxxxxxxxxx

are formally different and may have quite different semantics
(only the final delivery SMTP server knows).

I assume this still applies to the mail addresses

     joe@xxxxxxxxxxx       and
     jo%65@xxxxxxxxxxx

That's all well and good. Both

      mailto:joe@xxxxxxxxxxx       and
      mailto:jo%65@xxxxxxxxxxx

stand for

     joe@xxxxxxxxxxx

while to reach

     jo%65@xxxxxxxxxxx

you have to use

     mailto:jo%2565@xxxxxxxxxxx

The draft clearly says that '%' in an email address has to be escaped, 
and that's just what we are doing here. This may not really be easy, but 
it's clearly defined, and it's not rocket science. There's also an 
example, gorby%kremvax@xxxxxxxxxxx, in the example section.

That ambiguity is not just an encoding issue and difficulty for
those who use subaddresses.  It creates a vector for potential
attacks that is not noted in Security Considerations

Could you expand on the 'potential for attacks'? I understand that a lot 
can go wrong with escaping if one isn't careful, but "going wrong" 
doesn't necessarily translate into "attacks".

(that
section concentrates more on social problems, such as address
harvesting and information exposure, than on actual attacks on
the mail protocols and system).  More generally, while the
document makes the observation

	"Care has to be taken both when encoding as well as when
	decoding to make sure these operations are applied only
	once."

(from the end of Section 2(1)) it does not discuss how that is
to be done,

I'm not sure any explanation is necessary. If you put a mail address 
into a mailto URI, you escape, if you take it out, you unescape.

nor does it note the risks of not doing it in
Security Considerations.  That is important because there is
some anecdotal evidence that the rule is widely violated,
especially in web applications that move information back and
forth between mailto URI and email address formats.

(2) I18n issues

While the authors have done a careful and thoughtful job of
trying to anticipate the needs of the long-term (i.e.,
post-experimental) EAI work, there are possible ambiguities
that are not considered in addition to the "alternate address"
issue mentioned in Paragraph 3 of Section 1.  Some of the
important ones of these are the non-ASCII equivalent of the
discussion above: Because RFC 5321 fundamentals that are not
changed (or proposed to be changed) by the EAI work imply that

    duerst@xxxxxxxxxxx
    dürst@xxxxxxxxxxx                and
    d%C3%BCrst@xxxxxxxxxxx

represent three different target mailboxes

Yes, they represent three different mailboxes. But the mailto URIs/IRIs:

    mailto:duerst@xxxxxxxxxxx
    mailto:dürst@xxxxxxxxxxx               and
    mailto:d%C3%BCrst@xxxxxxxxxxx

represent only two different mailboxes, namely
duerst@xxxxxxxxxxx for the first mailto URI, and dürst@xxxxxxxxxxx for 
the second and third one. The mailbox d%C3%BCrst@xxxxxxxxxxx would be 
denoted by mailto:d%25C3%25BCrst@xxxxxxxxxxx

(unless the final
delivery server makes some decision to the contrary).  Again,
extreme care about the sequencing of decoding and other
interpretation can bypass the problem, but the document is not
nearly cautious enough about this and especially the security
and "user surprise" implications of trying to do that in
distributed modules so that operations are performed out of
order and/or other than exactly once.

I have added the following paragaraph to the security section:

Programs manipulating 'mailto' URIs SHOULD take great care to not
inadvertedly double-escape or double-unescape 'mailto' URIs, and
to make sure that escaping and unescaping conventions relating to URIs
and relating to mail addresses are applied in the right order.

I hope this addresses your concerns.

(3) Interactions between RFC 5321 and 5322.

The specification covers over the subtle differences between
envelope and header addresses, treating addr-spec and
?to=<hfvalue>  as effectively equivalent.  Differences between
the implications and semantics of the envelope/delivery address
and the header field "To:", which are quite clearly
distinguished in RFC 5321 and 5322 are ignored or prohibited.
Possibly that is a reasonable design choice, but it is not
discussed.  In my opinion, if the functionality the difference
implies is going to be inaccessible via the mailto URI, that
decision should be discussed, if only to prevent confusion,
poor implementations, and misuse.

Are you aware of any such poor implementations, or misuse? I'm only 
aware of the misuse of making the envelope and header To: different 
(often used by spammers), and same of course for From, although it 
doesn't apply here, because the spec says to ignore any from= in the 
URI. If yes, can you supply actual text?

Regards,    Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@xxxxxxxxxxxxxxx
_______________________________________________
Ietf mailing list
Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf