Hello, Thank you, John, for your detailed comments on the i18n aspect of this draft, which I admit I hadn't fully considered. I think you're right that, whatever approach is taken, it would make sense to add a short "Internationalization Considerations" section to state what the expected interaction is between this specification and non-ASCII addresses. More comments inline below: > Temporarily and for purposes of discussion, assume I agree with > the above as far as it goes (see below). Given that, what do > you, and the systems you have tested, propose to do about > addresses that contain non-ASCII characters in the local-part > (explicitly allowed by the present spec)? Note that lowercasing > [1] and case folding are different and produce different results > and that both are language-sensitive in a number of cases, what > specifically do you think the spec should recommend? I have not seen any specific examples of software which unintentionally converts characters to uppercase (although I can readily imagine such bugs/features), so I'm prepared to assume that the lowercasing logic can be safely limited to just the input strings which include only ASCII characters. My idea was for the client to make a reasonable effort to correct for a plausible (but rare) problem, so for the purposes of an experiment I think it is acceptable if this correction does not try anything more clever, like converting MUSTAFA.AKINCI@xxxxxxxxxxx to mustafa.akıncı@example.com (although mustafa.akinci@xxxxxxxxxxx should be tried). > Also, do you think it is acceptable to publish this document > with _any_ suggestions about lower-casing or "try this, then try > something else" search without at least an "Internationalization > Considerations" section that would discuss the issues [1] and/or > some more specific recommendation than "try lowercase" (more on > that, with a different problem case, below). You are right that adding such a section could be of great benefit to at least some implementers, even if the discussion in that section is simply "Only try lower-casing when the input is all ASCII". If someone can come up with something more helpful than that brief statement, then I'd be very supportive of it. > Dropping that assumption of agreement for discussion, I > personally believe that this document could be acceptable _as an > Experimental spec_ with any of the following three models, but > not without any of them: > > (i) The present "MUST not try to guess" text. > > (ii) A recommendation about lowercasing along the lines > you have outlined but with a clear discussion of i18n > issues and how to handle them [2]. > > (iii) A clear statement that the experiment is just an > experiment and that, for the purposes of the experiment, > addresses that contain non-ASCII characters in the local > part are not acceptable (note that would also require > pulling the UTF-8 discussion out of Section 3 and > dropping the references to RFC 6530 and friends). Perhaps you would settle for an option (ii.v) which is my lowercasing recommendation + a discussion of the i18n issues + that discussion being based on the experimental restriction of only applying the lowercasing logic to ASCII-only local parts. I hope that would be in keeping with your sensible suggestions above. > ... > e.g., > U+0066 U+006F U+0308 U+006F and > U+0066 U+00F6 U+006F > are perfectly good (and SMTPUTF8-valid) representations of the > string "föo" > > Using the same theory as your lower case approach, would you > recommend trying first one of those and then the other [3]? That is tempting, but I accept that it may be too much unnecessary complexity to suggest or recommend it at this stage of the experiment. I know that various ideas have been proposed for handling normalisation of local-parts more generally, and I think we should allow that work to progress separately, uncoupling it from the document at hand. > The more I think about it, the more I'm convinced that the > specification and allowance for UTF-8 [4] in the first bullet of > Section 3 is unacceptable without either text there that much > more carefully describes (and specifies what to do about) these > cases or an "Internationalization Considerations" section that > provides the same information. I suggest that anyone > contemplating writing such text carefully study (not just > reference) Section 10.1 of RFC 6530. Of course, simply > excluding non-ASCII local-parts from the experiment, as > suggested in (iii) above, would be an alternative. I have mixed > feelings about whether it would be an acceptable one for an > experiment. I am quite sure it would not be acceptable for a > standards-track document when the EAI work and/or the IETF > commitment to diversity are considered. I think that excluding non-ASCII local-parts from just the extra lower-casing logic, and pointing out the complexity of case handling in non-ASCII contexts in a separate section as you have suggested, might address the outstanding concerns, without hindering diversity. > ... > [2] I note that, historically, the DNS community has been very > reluctant to accept techniques that depend on or imply multiple > lookups for a single perceived object and, separately, for > "guess at this, try it, and, if that does not work, guess at > something else" approaches. Unless those concerns have > disappeared, the potential for combinatorial explosion when > lower-casing characters that may lie outside the ASCII > repertoire is truly impressive. That's another reasonable point, thank you. Hopefully it is mitigated, at least for the most part, by settling for only lower-casing characters for all-ASCII local-parts, avoiding the combinatorial explosion you mention. Also, this extra lower-casing step will only happen in the relatively rare situations where the input local-part contains at least one upper-case character (although I don't know in practice how many extra lookups that will lead to, on average). Best regards, Edwin