Re: Last Call: <draft-faltstrom-5892bis-04.txt> (The Unicode code points and IDNA - Unicode 6.0) to Proposed Standard

John C Klensin <john-ietf@xxxxxxx> · Sun, 29 May 2011 11:20:19 -0400

--On Sunday, May 29, 2011 08:58 +0200 Simon Josefsson
<simon@xxxxxxxxxxxxx> wrote:

>> in a Unicode 6.0 environment, evaluate U+19DA as PVALID and
>> therefore not raise that error, then it is not "compliant"
>> with RFC 5892, irrelevant of the "Updates" status of the
>> present document.
> 
> I don't see how.
> 
> My code uses the tables from RFC 5892 which were generated in
> an Unicode 5.2 environment.  My IDNA2008 code may eventually
> run in an Unicode 6.0 environment, or any other future version
> of Unicode.  I can't control the Unicode version used, and
> from what I understand this is one of the features of
> IDNA2008.  Implementations need not lock down the Unicode
> version to a single Unicode version, as they had to do for
> IDNA2003.

It seems to me that this is exactly where we are having a
misunderstanding.   In terms of determining conformance, those
tables are not normative, so it is not possible to say "I
implemented the tables in RFC 5892 and therefore I conform to
the standard".  The closest you can get would be to say "I
implemented the rules and tested against the tables when those
rules were applied to Unicode 5.2 and therefore have great
confidence in my implementaton", but conformance statements stop
with "implemented the rules correctly".  

For practical reasons, we expect to see production
implementations using tables or other abstractions of the rules
that are somewhat pre-compiled, not applying the rule set each
time.   One consequence of this is that a given table-based
implementation is inevitably dependent on versions of Unicode
even if the Standard (and its conformance requirements) is not.
That would be true even if the type of change (correction) that
occurred with version 6.0 of Unicode had not occurred. It would
still be necessary to construct version-dependent tables to deal
with newly-assigned code points.

>From the perspective of those who argued that the document
titled "...5852bis.." should not be produced and published
because it is unnecessary, the point is that we would not have
generated the document at all had the only changes been the
addition of new PROTOCOL-VALID and DISALLOWED code points by
virtue of new code points being added to Unicode.  But, in
practical terms, that is a much greater change to an
implementation than anything related to these few characters
with changed properties.

And, again, this situation would be true of virtually any
specification that depends on Unicode, regardless of whether the
definition is in terms of  rules/properties or tables. There
would be an exception if the specification depended on code
point assignments alone and was okay with treating unassigned
code points as if they had been assigned if they turned up in
the data stream (IDNA2003 attempted to lay the foundation for
the latter but failed because all of the properties that an
unassigned code point will have when it is assigned cannot be
known).  For anything else, working properly with a given
version of Unicode requires updating of code point tables,
normalization tables, and assorted property tables.   As Mark
points out, defining things in terms of the tables, with the
rules providing only guidance, has some important advantages in
this regard.  However, it guarantees the need to talk about
conformance to a Unicode version, not just "Unicode".

> If this model is not permitted, I believe there are bigger
> problems.
> 
> To avoid doubt, and to back up your assertment that my
> implementation is non-compliant, please point to the "MUST" or
> "SHOULD" in RFC 5892 that forbis this, to me, logical
> implementation approach.

The key is the text in Section 4 that says:

	"The table in Appendix B shows, for illustrative
	purposes, the consequences of the categories and
	classification rules, and the resulting property values.

	"The list of code points that can be found in Appendix B
	is non-normative.  Sections 2 and 3 are normative."

It seems to me that is very clear about the relationship between
the rules and the tables.   That relationship is reiterated in
Section 7.1.1 of RFC 5892.

You could reasonably say that your implementation is conformant
but current only to Unicode 5.2.   If you are willing to say
that, I guess you don't need to change anything.   While we
recognize that you have no control over the Unicode version in
use, good sense suggests that systems will update versions of
Unicode (including all of the associated tables and support
routines as applicable) and versions of your library together,
While that should be clear from the context of the discussions
in RFC 5891 and 5892, RFC 5894 is quite explicit about it in the
second bullet of Section 7.1.2:

 "o The Unicode tables (i.e., tables of code points,
	character classes, and properties) and IDNA tables
	(i.e., tables of contextual rules such as those
	that appear in the Tables document), must be
	consistent on the systems performing or validating
	labels to be registered.  Note that this does not
	require that tables reflect the latest version of
	Unicode, only that all tables used on a given
	system are consistent with each other."

Similarly, the first bullet of 7.1.3 reads:

 "o Maintain IDNA and Unicode tables that are consistent
	with regard to versions, i.e., unless the application
	actually executes the classification rules in the Tables
	document [RFC5892], its IDNA tables must be derived from
	the version of Unicode that is supported more generally on
	the system.  As with registration, the tables need not
	reflect the latest version of Unicode, but they must be
	consistent."

I hope that helps.

best,
     john

_______________________________________________
Ietf mailing list
Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf