--On Wednesday, April 09, 2014 12:07 -0700 The IESG <iesg-secretary@xxxxxxxx> wrote: > The IESG has received a request from the Preparation and > Comparison of Internationalized Strings WG (precis) to > consider the following document: - 'PRECIS Framework: > Preparation and Comparison of Internationalized Strings in > Application Protocols' > <draft-ietf-precis-framework-15.txt> as Proposed Standard > > The IESG plans to make a decision in the next few weeks, and > solicits final comments on this action. Please send > substantive comments to the ietf@xxxxxxxx mailing lists by > 2014-04-23. IESG, I have deliberately waited until the end of the Last Call period as posted [1], hoping that you would get (or generate) more focused comments on the draft in the interim. The issues discussed in this note were raised while what became the PRECIS WG (originally known by names like "Stringprep-bis") was being discussed and chartered and again on several occasions during the meetings of the WG. At least in my opinion, they were never discussed seriously: the WG made one important improvement (shifting to a rule-based inclusion approach) but has essentially ignored the fundamental problem. Because the predecessors of this note have not been actively considered in the WG or while its charter was being created, I don't actually expect it to accomplish anything other than to get some issues on the record. But I think the circumstances require one last try. This note consists of a high-level summary with details in footnotes on its various points. It is as short as I cam make it without leaving out issues and explanations that I believe are important. Summary: I see fundamental problems with approval of this specification and some issues with the decisions it includes. The first of these is temporary; the second more fundamental. (1) After the PRECIS WG task list settled down into its charter [2], the goal became, in essence, to replace Stringprep with something more explicitly rule-based and to update (or advise on) the various uses of Stringprep profiles. As described in the PRECIS charter, that approach noted the relationships among the various parts of the PRECIS work and called for handling the "framework, profile replacements, and guidelines ... in parallel as much as possible". The interrelated nature of the pieces makes it inappropriate for the IESG to approve the "framework" document for publication on the standards track at this time because doing so would foreclose some reasonable discussion of issues with the other specs -- exactly the situation the charter language was intended, at least in my recollection, to prevent. The approval and publication of RFC 6885 over a year ago has already been used that way. Some of the decisions reflected in the current draft illustration why more testing against specific profile examples and guidelines is important [5]. Recommendation: Tentatively approve the document now if that seems otherwise justified (I suggest below that it is not) but do not issue a Protocol Action Notice or handoff to the RFC Editor until after a sufficient number of of "profile replacements" and "guidelines" have been completed and examined through the IETF Last Call process to validate the "framework" provisions. The analogy of such a request to "running code" requirements should be obvious. (2) The longer-term problem is the one that PRECIS simply refused to address. When designing systems with direct effects on and interactions with users, predictability based on prior user experience and expectations is vital. That predictability is often expressed as the Law of Least Astonishment. In the case of use of non-ASCII characters in Internet applications, "least astonishment" implies, first, that rules should be intuitive and easily understood to the extent possible. Second and more important, if a user understands the rules of one application, she should be able to accurately extrapolate from that application to others. If there is a reason why that extrapolation does not work, the reasons should be clearly expressed and sufficiently comprehensible to be easily incorporated into the user's cognitive framework... and there should be few such cases. With the addition of "IdentifierClass" and "FreeformClass" to what, in its language, might be called "IDNA2008Class", we already have between one and two "Classes" too many for good user predictability. The language that indicates that additional classes might be defined in future specifications is not nearly strong enough in discouraging additional classes and explaining the reasons why. The PRECIS framework appears to fail those tests. It can be read as actively encouraging multiple profiles [3]. This is true even though the third bullet under "It is expected that this framework will yield..." explicitly acknowledges "more accurate expectations about the characters that are acceptable in various contexts" as an explicit objective. Recommendation: Hold this document until the various categories are exercised by at least a sampling of applicable profiles. Make the problems with excessive profiles explicit; impose requirements on additional profiles that include an explanation of why they are needed given the user-level interoperability and astonishment problems that can result from even three of them and require a higher level of review and consensus for creating new ones than "expert review". Modify the Security Considerations section to identify user astonishment and consequent confusion about what rules are being applied in a given PRECIS-User protocol as a risk and possible attack vector. thanks, john ----------------------------- [1] Despite the apparently-automatic announcement dated April 23, 2014 00:07 -0700 and indicating that the last call period had ended and the document state changed, the announcement, quoted above, says "by 2014-04-23". To the best of my knowledge, there has never been an announcement to the community, much less one backed by community discussion and consensus, that "by <date>" now means "00:07 California time on that date". If anything, we have tended to interpret "by <date>" as COB on that date or the last minute of that date anywhere in the world. [2] http://datatracker.ietf.org/wg/precis/charter/ [3] Not only should multiple profiles be discouraged for this type of case by the Law of Least Astonishment, but general IETF experience indicates that profiles are a bad idea and that protocols that depend on them (and use a significant variety) tend to be less successful than those that do not. [4] Sections 3.2.2 and 3.3.2 of -15. [5] There are issues with the two classes that are defined in this document that have not, I believe, been carefully addressed in the WG. At a minimum, they need to be considered more carefully in context with specific examples without preempting the choices to be made for those examples. The following are examples, not an exclusive list: (i) The "Contextual Rule" categories of IDNA2008 have proven to be extremely problematic, confusing, and controversial enough to be one of the key issues in the UTR 46 battles that have impeded IDNA2008 adoption in some important communities. While I still believe that the IDNA2008 decision was the right one, inclusion of those characters as Valid for both classes [4] should, given those bad experiences, require significantly more explanation and/or justification than this draft appears to provide. If the intent of incorporating this list in PRECIS is to provide for compatibility with IDNA2008, the list should be incorporated by reference (either to RFC 5892 and its successors or to a separate documents that updates and replaces the list in RFC 5892 as well as being used for PRECIS), not be provided as a list of code points that could cause the two definitions to diverge if changes are made to one or the other. The issue may apply more generally to the other characters listed in Section 7.6 and to other IDNA-derived elements of Section 7. (ii) Characters with compatibility equivalents are problematic, in large measure because of a Unicode issue that has been extensively discussed in the past. Basically the compatibility equivalency category mixes several unrelated and incompatible (sic) groups of things under a single label. One extreme is occupied by characters used with East Asian scripts that are distinguished only by their widths. If the Unicode design principles of "characters, not glyphs" and "plain text" had been followed, these different-width variations would never have been assigned different code points. As a contrasting example, the relevance of, and distinctions among, special mathematical code points distinguished by mathematical use and specific type styles is less clear: treating them as separate characters may be justified under some circumstances. A more extreme example occurs with some Chinese characters that Unicode treats as compatibility equivalents for other characters. That treatment is usually (and perhaps always) appropriate when the "meaning" of the characters is intended. But some of those characters are, or historically have been, used in personal names. To the extent to which those names are used as identifiers, e.g., for a person thus named, treating them as invalid is inappropriate and applying a compatibility mapping to them only slightly less bad. (iii) Experience with IDNA strongly suggests that "DefaultIgnorable" is a bad idea. Forbidding some (or perhaps all) of these characters may be reasonable; treating them as if they were not present impedes possibly-reasonable future extensions and provides an attack vector for phishing and other types of unpleasant behavior that do not appear to be called out in Security Considerations. It is also worth noting that one difference between IDNA2008 and this specification is that the former is not profiled (except by registries providing their own subsetting rules) while the latter is intended as a base for profiling. While the order in which categorization rules is applied in Section 8 provides some protection (perhaps sufficient protection if the spec is never updated or used as the basis for a different profile), it is worth noting that some code points appear in more than one category (e.g., the notorious ZWJ and ZWNJ in both JoinControl (Section 7.8) and PrecisIgnorableProperties (Section 7.13)) and that this should at least be called out explicitly.