Re: [precis] Last Call: <draft-ietf-precis-framework-15.txt> (PRECIS Framework: Preparation and Comparison of Internationalized Strings in Application Protocols) to Proposed Standard

Peter Saint-Andre <stpeter@xxxxxxxxxx> · Thu, 24 Apr 2014 09:46:27 -0600

On 4/23/14, 10:17 PM, John C Klensin wrote:

--On Wednesday, April 09, 2014 12:07 -0700 The IESG
<iesg-secretary@xxxxxxxx> wrote:

The IESG has received a request from the Preparation and
Comparison of Internationalized Strings WG (precis) to
consider the following document: - 'PRECIS Framework:
Preparation and Comparison of Internationalized    Strings in
Application Protocols'
   <draft-ietf-precis-framework-15.txt> as Proposed Standard

The IESG plans to make a decision in the next few weeks, and
solicits final comments on this action. Please send
substantive comments to the ietf@xxxxxxxx mailing lists by
2014-04-23.

IESG,

I have deliberately waited until the end of the Last Call period
as posted [1], hoping that you would get (or generate) more
focused comments on the draft in the interim.

John, that seems like an exceedingly unrealistic hope. You know better 
than anyone that internationalization expertise is spread very thinly in 
the IETF (and elsewhere). Folks from the IETF community who care about 
i18n are already likely to have provided feedback on the PRECIS 
documents. We just don't have very many such people.

The issues
discussed in this note were raised while what became the PRECIS
WG (originally known by names like "Stringprep-bis") was being
discussed and chartered and again on several occasions during
the meetings of the WG.   At least in my opinion, they were
never discussed seriously: the WG made one important improvement
(shifting to a rule-based inclusion approach) but has
essentially ignored the fundamental problem.  Because the
predecessors of this note have not been actively considered in
the WG or while its charter was being created, I don't actually
expect it to accomplish anything other than to get some issues
on the record.  But I think the circumstances require one last
try.

Ignored might be a bit strong. Perhaps those active in the WG did not 
see a way to achieve the high goal you have set here. Throwing up our 
hands and doing nothing (and thus remaining stuck on Stringprep and 
Unicode 3.2) seems like it would have been a worse outcome.

This note consists of a high-level summary with details in
footnotes on its various points.  It is as short as I cam make
it without leaving out issues and explanations that I believe
are important.

Summary:

I see fundamental problems with approval of this specification
and some issues with the decisions it includes.  The first of
these is temporary; the second more fundamental.

(1) After the PRECIS WG task list settled down into its charter
[2], the goal became, in essence, to replace Stringprep with
something more explicitly rule-based and to update (or advise
on) the various uses of Stringprep profiles.  As described in
the PRECIS charter, that approach noted the relationships among
the various parts of the PRECIS work and called for handling the
"framework, profile replacements, and guidelines ... in parallel
as much as possible".  The interrelated nature of the pieces
makes it inappropriate for the IESG to approve the "framework"
document for publication on the standards track at this time
because doing so would foreclose some reasonable discussion of
issues with the other specs -- exactly the situation the charter
language was intended, at least in my recollection, to prevent.
The approval and publication of RFC 6885 over a year ago has
already been used that way.

Some of the decisions reflected in the current draft
illustration why more testing against specific profile examples
and guidelines is important [5].

We have 5 profiles in a quite advanced state, specified in 3 I-Ds 
(draft-ietf-precis-saslprepbis, draft-ietf-

Recommendation: Tentatively approve the document now if that
seems otherwise justified (I suggest below that it is not) but
do not issue a Protocol Action Notice or handoff to the RFC
Editor until after a sufficient number of of "profile
replacements" and "guidelines" have been completed and examined
through the IETF Last Call process to validate the "framework"
provisions.  The analogy of such a request to "running code"
requirements should be obvious.

IMHO we are close to having that running code, but we haven't pushed all 
the documents forward at exactly the same time.

(2) The longer-term problem is the one that PRECIS simply
refused to address.

It's a hard problem. Refused might be a bit strong.

When designing systems with direct effects
on and interactions with users, predictability based on prior
user experience and expectations is vital.  That predictability
is often expressed as the Law of Least Astonishment.  In the
case of use of non-ASCII characters in Internet applications,
"least astonishment" implies, first, that rules should be
intuitive and easily understood to the extent possible.  Second
and more important, if a user understands the rules of one
application, she should be able to accurately extrapolate from
that application to others.  If there is a reason why that
extrapolation does not work, the reasons should be clearly
expressed and sufficiently comprehensible to be easily
incorporated into the user's cognitive framework... and there
should be few such cases.

That sounds ideal, but unfortunately we are dealing with a quite messy 
reality in which we have many different kinds of identifiers - domain 
names, email addresses, file names, chatrooms, IM handles, nicknames, 
etc. Reducing them all to one thing seems unrealistic to me. I would 
love to achieve that, but I don't see it happening anytime soon.

With the addition of "IdentifierClass" and "FreeformClass" to
what, in its language, might be called "IDNA2008Class", we
already have between one and two "Classes" too many for good
user predictability.  The language that indicates that
additional classes might be defined in future specifications is
not nearly strong enough in discouraging additional classes and
explaining the reasons why.

The PRECIS framework appears to fail those tests.  It can be
read as actively encouraging multiple profiles [3].

Because we have a world with multiple identifiers.

This is
true even though the third bullet under "It is expected that
this framework will yield..." explicitly acknowledges "more
accurate expectations about the characters that are acceptable
in various contexts" as an explicit objective.

Recommendation: Hold this document until the various categories
are exercised by at least a sampling of applicable profiles.

See above. We have 5 profiles very close to done.

Also: hold in what sense?

These documents are going forward to Proposed Standard. IMHO we need 
further experience with them, in the field, and possible revision in the 
future. But we have a stronger basis for that now (with PRECIS) than we 
ever would have had with Stringprep.

I think the WG realizes that PRECIS is not perfect as it is. I doubt 
that perfection can be achieved in the messy domain of i18n, though.

Make the problems with excessive profiles explicit; impose
requirements on additional profiles that include an explanation
of why they are needed given the user-level interoperability and
astonishment problems that can result from even three of them
and require a higher level of review and consensus for creating
new ones than "expert review".  Modify the Security
Considerations section to identify user astonishment and
consequent confusion about what rules are being applied in a
given PRECIS-User protocol as a risk and possible attack vector.

Those do seem like good things.

I am out of time right now to reply to the topics from your footnotes.

Peter

thanks,
     john

   -----------------------------

[3] Not only should multiple profiles be discouraged for this
type of case by the Law of Least Astonishment, but general IETF
experience indicates that profiles are a bad idea and that
protocols that depend on them (and use a significant variety)
tend to be less successful than those that do not.

[4] Sections 3.2.2 and 3.3.2 of -15.

[5] There are issues with the two classes that are defined in
this document that have not, I believe, been carefully addressed
in the WG.  At a minimum, they need to be considered more
carefully in context with specific examples without preempting
the choices to be made for those examples.  The following are
examples, not an exclusive list:

(i) The "Contextual Rule" categories of IDNA2008 have proven to
be extremely problematic, confusing, and controversial enough to
be one of the key issues in the UTR 46 battles that have impeded
IDNA2008 adoption in some important communities.  While I still
believe that the IDNA2008 decision was the right one, inclusion
of those characters as Valid for both classes [4] should, given
those bad experiences, require significantly more explanation
and/or justification than this draft appears to provide.  If the
intent of incorporating this list in PRECIS is to provide for
compatibility with IDNA2008, the list should be incorporated by
reference (either to RFC 5892 and its successors or to a
separate documents that updates and replaces the list in RFC
5892 as well as being used for PRECIS),  not be provided as a
list of code points that could cause the two definitions to
diverge if changes are made to one or the other.  The issue may
apply more generally to the other characters listed in Section
7.6 and to other IDNA-derived elements of Section 7.

(ii) Characters with compatibility equivalents are problematic,
in large measure because of a Unicode issue that has been
extensively discussed in the past.  Basically the compatibility
equivalency category mixes several unrelated and incompatible
(sic) groups of things under a single label.  One extreme is
occupied by characters used with East Asian scripts that are
distinguished only by their widths.  If the Unicode design
principles of "characters, not glyphs" and "plain text" had been
followed, these different-width variations would never have been
assigned different code points.  As a contrasting example, the
relevance of, and distinctions among, special mathematical code
points distinguished by mathematical use and specific type
styles is less clear: treating them as separate characters may
be justified under some circumstances.  A more extreme example
occurs with some Chinese characters that Unicode treats as
compatibility equivalents for other characters.  That treatment
is usually (and perhaps always) appropriate when the "meaning"
of the characters is intended.  But some of those characters
are, or historically have been, used in personal names.  To the
extent to which those names are used as identifiers, e.g., for a
person thus named, treating them as invalid is inappropriate and
applying a compatibility mapping to them only slightly less bad.

(iii) Experience with IDNA strongly suggests that
"DefaultIgnorable" is a bad idea.  Forbidding some (or perhaps
all) of these characters may be reasonable; treating them as if
they were not present impedes possibly-reasonable future
extensions and provides an attack vector for phishing and other
types of unpleasant behavior that do not appear to be called out
in Security Considerations.  It is also worth noting that one
difference between IDNA2008 and this specification is that the
former is not profiled (except by registries providing their own
subsetting rules) while the latter is intended as a base for
profiling.  While the order in which categorization rules is
applied in Section 8 provides some protection (perhaps
sufficient protection if the spec is never updated or used as
the basis for a different profile), it is worth noting that some
code points appear in more than one category (e.g., the
notorious ZWJ and ZWNJ in both JoinControl (Section 7.8) and
PrecisIgnorableProperties (Section 7.13)) and that this should
at least be called out explicitly.

_______________________________________________
precis mailing list
precis@xxxxxxxx
https://www.ietf.org/mailman/listinfo/precis