Re: [precis] Last Call: <draft-ietf-precis-framework-15.txt> (PRECIS Framework: Preparation and Comparison of Internationalized Strings in Application Protocols) to Proposed Standard

John C Klensin <john-ietf@xxxxxxx> · Thu, 24 Apr 2014 00:17:52 -0400

--On Wednesday, April 09, 2014 12:07 -0700 The IESG
<iesg-secretary@xxxxxxxx> wrote:

> The IESG has received a request from the Preparation and
> Comparison of Internationalized Strings WG (precis) to
> consider the following document: - 'PRECIS Framework:
> Preparation and Comparison of Internationalized    Strings in
> Application Protocols'
>   <draft-ietf-precis-framework-15.txt> as Proposed Standard
> 
> The IESG plans to make a decision in the next few weeks, and
> solicits final comments on this action. Please send
> substantive comments to the ietf@xxxxxxxx mailing lists by
> 2014-04-23.

IESG,

I have deliberately waited until the end of the Last Call period
as posted [1], hoping that you would get (or generate) more
focused comments on the draft in the interim.  The issues
discussed in this note were raised while what became the PRECIS
WG (originally known by names like "Stringprep-bis") was being
discussed and chartered and again on several occasions during
the meetings of the WG.   At least in my opinion, they were
never discussed seriously: the WG made one important improvement
(shifting to a rule-based inclusion approach) but has
essentially ignored the fundamental problem.  Because the
predecessors of this note have not been actively considered in
the WG or while its charter was being created, I don't actually
expect it to accomplish anything other than to get some issues
on the record.  But I think the circumstances require one last
try.

This note consists of a high-level summary with details in
footnotes on its various points.  It is as short as I cam make
it without leaving out issues and explanations that I believe
are important.

Summary:

I see fundamental problems with approval of this specification
and some issues with the decisions it includes.  The first of
these is temporary; the second more fundamental.  

(1) After the PRECIS WG task list settled down into its charter
[2], the goal became, in essence, to replace Stringprep with
something more explicitly rule-based and to update (or advise
on) the various uses of Stringprep profiles.  As described in
the PRECIS charter, that approach noted the relationships among
the various parts of the PRECIS work and called for handling the
"framework, profile replacements, and guidelines ... in parallel
as much as possible".  The interrelated nature of the pieces
makes it inappropriate for the IESG to approve the "framework"
document for publication on the standards track at this time
because doing so would foreclose some reasonable discussion of
issues with the other specs -- exactly the situation the charter
language was intended, at least in my recollection, to prevent.
The approval and publication of RFC 6885 over a year ago has
already been used that way.

Some of the decisions reflected in the current draft
illustration why more testing against specific profile examples
and guidelines is important [5].

Recommendation: Tentatively approve the document now if that
seems otherwise justified (I suggest below that it is not) but
do not issue a Protocol Action Notice or handoff to the RFC
Editor until after a sufficient number of of "profile
replacements" and "guidelines" have been completed and examined
through the IETF Last Call process to validate the "framework"
provisions.  The analogy of such a request to "running code"
requirements should be obvious.

(2) The longer-term problem is the one that PRECIS simply
refused to address.  When designing systems with direct effects
on and interactions with users, predictability based on prior
user experience and expectations is vital.  That predictability
is often expressed as the Law of Least Astonishment.  In the
case of use of non-ASCII characters in Internet applications,
"least astonishment" implies, first, that rules should be
intuitive and easily understood to the extent possible.  Second
and more important, if a user understands the rules of one
application, she should be able to accurately extrapolate from
that application to others.  If there is a reason why that
extrapolation does not work, the reasons should be clearly
expressed and sufficiently comprehensible to be easily
incorporated into the user's cognitive framework... and there
should be few such cases.

With the addition of "IdentifierClass" and "FreeformClass" to
what, in its language, might be called "IDNA2008Class", we
already have between one and two "Classes" too many for good
user predictability.  The language that indicates that
additional classes might be defined in future specifications is
not nearly strong enough in discouraging additional classes and
explaining the reasons why.

The PRECIS framework appears to fail those tests.  It can be
read as actively encouraging multiple profiles [3].  This is
true even though the third bullet under "It is expected that
this framework will yield..." explicitly acknowledges "more
accurate expectations about the characters that are acceptable
in various contexts" as an explicit objective.

Recommendation: Hold this document until the various categories
are exercised by at least a sampling of applicable profiles.
Make the problems with excessive profiles explicit; impose
requirements on additional profiles that include an explanation
of why they are needed given the user-level interoperability and
astonishment problems that can result from even three of them
and require a higher level of review and consensus for creating
new ones than "expert review".  Modify the Security
Considerations section to identify user astonishment and
consequent confusion about what rules are being applied in a
given PRECIS-User protocol as a risk and possible attack vector.

thanks,
    john

  -----------------------------

[1] Despite the apparently-automatic announcement dated April
23, 2014 00:07 -0700 and indicating that the last call period
had ended and the document state changed, the announcement,
quoted above, says "by 2014-04-23".  To the best of my
knowledge, there has never been an announcement to the
community, much less one backed by community discussion and
consensus, that "by <date>" now means "00:07 California time on
that date".  If anything, we have tended to interpret "by
<date>" as COB on that date or the last minute of that date
anywhere in the world.

[2] http://datatracker.ietf.org/wg/precis/charter/

[3] Not only should multiple profiles be discouraged for this
type of case by the Law of Least Astonishment, but general IETF
experience indicates that profiles are a bad idea and that
protocols that depend on them (and use a significant variety)
tend to be less successful than those that do not.  

[4] Sections 3.2.2 and 3.3.2 of -15.

[5] There are issues with the two classes that are defined in
this document that have not, I believe, been carefully addressed
in the WG.  At a minimum, they need to be considered more
carefully in context with specific examples without preempting
the choices to be made for those examples.  The following are
examples, not an exclusive list:

(i) The "Contextual Rule" categories of IDNA2008 have proven to
be extremely problematic, confusing, and controversial enough to
be one of the key issues in the UTR 46 battles that have impeded
IDNA2008 adoption in some important communities.  While I still
believe that the IDNA2008 decision was the right one, inclusion
of those characters as Valid for both classes [4] should, given
those bad experiences, require significantly more explanation
and/or justification than this draft appears to provide.  If the
intent of incorporating this list in PRECIS is to provide for
compatibility with IDNA2008, the list should be incorporated by
reference (either to RFC 5892 and its successors or to a
separate documents that updates and replaces the list in RFC
5892 as well as being used for PRECIS),  not be provided as a
list of code points that could cause the two definitions to
diverge if changes are made to one or the other.  The issue may
apply more generally to the other characters listed in Section
7.6 and to other IDNA-derived elements of Section 7.

(ii) Characters with compatibility equivalents are problematic,
in large measure because of a Unicode issue that has been
extensively discussed in the past.  Basically the compatibility
equivalency category mixes several unrelated and incompatible
(sic) groups of things under a single label.  One extreme is
occupied by characters used with East Asian scripts that are
distinguished only by their widths.  If the Unicode design
principles of "characters, not glyphs" and "plain text" had been
followed, these different-width variations would never have been
assigned different code points.  As a contrasting example, the
relevance of, and distinctions among, special mathematical code
points distinguished by mathematical use and specific type
styles is less clear: treating them as separate characters may
be justified under some circumstances.  A more extreme example
occurs with some Chinese characters that Unicode treats as
compatibility equivalents for other characters.  That treatment
is usually (and perhaps always) appropriate when the "meaning"
of the characters is intended.  But some of those characters
are, or historically have been, used in personal names.  To the
extent to which those names are used as identifiers, e.g., for a
person thus named, treating them as invalid is inappropriate and
applying a compatibility mapping to them only slightly less bad.

(iii) Experience with IDNA strongly suggests that
"DefaultIgnorable" is a bad idea.  Forbidding some (or perhaps
all) of these characters may be reasonable; treating them as if
they were not present impedes possibly-reasonable future
extensions and provides an attack vector for phishing and other
types of unpleasant behavior that do not appear to be called out
in Security Considerations.  It is also worth noting that one
difference between IDNA2008 and this specification is that the
former is not profiled (except by registries providing their own
subsetting rules) while the latter is intended as a base for
profiling.  While the order in which categorization rules is
applied in Section 8 provides some protection (perhaps
sufficient protection if the spec is never updated or used as
the basis for a different profile), it is worth noting that some
code points appear in more than one category (e.g., the
notorious ZWJ and ZWNJ in both JoinControl (Section 7.8) and
PrecisIgnorableProperties (Section 7.13)) and that this should
at least be called out explicitly.