Re: [Last-Call] Opsdir last call review of draft-faltstrom-unicode12-03

John C Klensin <john-ietf@xxxxxxx> · Fri, 03 Dec 2021 11:21:59 -0500

(Most of the comments in this note are likely to be of specific
interest to Patrik, Tim, and those deeply involved with IDNA.
Other busy people, probably including the IESG, might want to
just skip to the two paragraphs at the end and then treat the
rest of the text as examples they may or may not want to study.

--On Friday, December 3, 2021 13:13 +0100 Patrik Fältström
<paf=40frobbit.se@xxxxxxxxxxxxxx> wrote:

>...
>> I just find it surprising to see completely unreferenced
>> Appendices in the document.  Why are they there if not
>> referenced somewhere?  Some of them do not match to the
>> changes discussed in section 3.
> 
> Hmm....I added this to the section titled "Notable Changes
> Between Unicode 6.0.0 and 12.0.0":
> 
>         Among the changes between the Unicode versions, most
> code         points that change derived property value change
> from         UNASSIGNED to PVALID or from UNASSIGNED to
> DISALLOWED. The         interesting changes in derived
> property values include other         changes. All changes
> between the major versions of Unicode can be         found in
> <xref target="Appendix-6.0.0"/>, <xref
> target="Appendix-7.0.0"/>, <xref target="Appendix-8.0.0"/>,
> <xref target="Appendix-9.0.0"/>, <xref
>         target="Appendix-10.0.0"/> and <xref
>         target="Appendix-11.0.0"/>.

Patrik,

Since this fine-tuning effort (which I appreciate) to make
things crystal-clear, I suggest replacing "interesting" with
"significant" or "important".  "Interesting" in your sentence
just has too many possible implications.    For the same
purpose, what might be even better would be to rephrase, so

NEW (from your note):
	The interesting changes in derived property values
	include other changes. 

NEWER (suggested):
	Changes in derived property values other than those
	transitions are normally the only ones of significance
	to IDNA users and applications.

That is a quick thought.  I think it can be improved further,
but hope we can all trust the RPC to work it out.

Also...

>>> All changes are listed in the Appendices.
>> 
>> I don't see where the 2+25 figures are taken from.
> 
> Unicode 6.0.0 already have code points with various derived
> property values. 

Does this refer to the text of Unicode 6.0.0 or is it actually
about RFC 6452?  If the latter, perhaps "RFC 6452 has already
identified code points with various derived property values".
If it is really about Unicode 6.0.0, I don't quite understand
the point because _every_ version of Unicode has "Derived
Properties" [1].  If it does not affect the text of the
document, maybe we don't care other that for the edification of
Tim and others as we move forward (see comment at end).

> In the list of "changes from 6.0.0 to 7.0.0"
> I have written "CONTEXTJ did not change, at 2". To me this is
> crystal clear the number of code points with the derived
> property value CONTEXTJ is 2 in Unicode 7.0.0 as it was in
> Unicode 6.0.0. That it is 2 can be found in earlier RFCs or in
> the IANA registry.

You might be able to make that a bit more clear by saying
something like "There were no changes to the number of code
points assigned to CONTEXTJ; the number remains at 2" or words
to that general effect (again see comment at end).

>...
>>>> Comments:
>>>> 
>>>> In section 1, CONTEXT is explained, but the later use of
>>>> CONTEXTJ and CONTEXTO are not.  This would be useful to
>>>> include.
>>> 
>>> See section 1 of RFC 5892. I will add a clarification as
>>> follows:
>>> 
>>>  As explained in <xref target="RFC5892">RFC
>>>  5892</xref> CONTEXT is in turn divided into CONTEXTJ and
>>>  CONTEXTO.
>> 
>> Thanks.

If there is still any remaining confusion (and maybe even if
there is not), perhaps a reference to Section 3.1 of RFC 5894
would be helpful.  It seems to me that part of Tim's confusion
(and likely confusion by readers who, like him, are not
thoroughly familiar with IDNA2008) is that RFC 5892 talks about
those contextual rules and how and why they are assigned to code
points but not about what they are for, why there are two types
and how they differ in practice, and so on.  Some of that
material is in RFC 5891 ("the Protocol Document") to which the
category definitions of RFC 5892 explicitly points, but the more
or less plain English, quasi-tutorial, explanation is in RFC
5894.   And see comment at end.

>>>> Section 2, penultimate para, s the first use, unexplained,
>>>> of CONTEXTO/J.
>>> 
>>> Changed to:
>>> 
>>>  The IDNA2008 rules use the Unicode Standard to
>>>  create a further subset of code points and context that are
>>>  permitted in DNS labels associated with its PVALID, and
>>>  CONTEXT (CONTEXTJ or CONTEXTO) derived property values. DNS
>>>  registries and other organizations that deal with IDNs are
>>>  supposed to create their own subsets from IDNA2008 for use
>>>  by those registries and organizations.
>> 
>> Thanks.

While these are improvements, I am concerned about the path this
is going down, especially because I expect this document to set
precedents for its successors.  A paragraph like the above,
while helpful, is basically a reprise of text that already
appear in RFCs 5890, 5894, 8753, and 5892 itself.  That strikes
me as unwise: it turns this document into more of a tutorial
than it should be, creates risks of having to update it (or
having it further confuse readers) if IDNA2008 definitions are
ever upgraded, and makes it harder for someone who is thoroughly
familiar with IDNA2008 and its applications, and why this type
of document is needed at all, to find information that is
relevant to them and their needs.  

Similar comments apply to the explanation of why conservatism is
important even though a one-sentence cross-reference is, at
worst, harmless. 

See comment at end.

>>>> In Section 2, last para, maybe point forward to the
>>>> security section regarding the reason for conservatism?
>>> 
>>> Added paragraph at the end:
>>> 
>>>  See also the Security Considerations section in this
>>>  document.
>> 
>> Thanks.

See above.

>...

While each of these changes makes the I-D better when it is
viewed as an isolated document, I fear that we may be going down
the wrong path.  As I think the Introduction makes fairly clear,
this document is part of the IDNA2008 collection and is only
marginally comprehensible without an understanding of the base
documents (RFCs 5890-5894).  Every bit of tutorial explanation
that is added here is a step toward encouraging people to
believe that they can read this document and understand what it,
and the tables and their application, are about without
understanding IDNA2008.   Many of the issues Tim has caught in
his obviously careful reading are very real.  The fixes are
significant and important --  Patrik has already thanked him and
so do I, but the community owes him thanks more generally.   But
others, and Patrik's response to them, seem to be straying in
the direction of tutorial information that already appears in
the base IDNA2008 documents or in RFC 8753.  I think we should
be very cautious about that both because of the "belief" issue
above and because I (and I think everyone else who has commented
recently) would prefer to get this document out than to have to
carefully check these tutorial explanations for complete
consistency with the IDNA2008 base, down to reviewing the impact
of even slight differences in terminology and picking the last
nit.  

So, again in the interest of being done with this and letting
those involved move on to other work (including thinking about
how to get the Unicode 13.x and 14.x reviews moving forward), I
suggest that we do not try to un-do any of the changes that have
been made already, fixing them when needed.  If the fixes to
tutorial explanations involve significant work and/or thought,
removing those paragraphs should be considered as an
alternative, but I haven't seen any cases of that (my
suggestions about the changes above notwithstanding).  In
retrospect, I wonder whether Section 3.2 (or most of it) even
belongs here -- that material mostly is a repetition and
clarification of material already in the base documents.  Where
it is not, we might have been better off with a separate
document that explicitly updates part of the core.  However, I
don't think that is worth changing the I-D, or even debating the
point, now.   I make one suggestion to clarify the situation:
Add a sentence in the Introduction that clearly conveys the
principle that this document is a supplement to the core
IDNA2008 specs and that anyone who tries to read it without a
solid understanding of those specs is going to be in Big Trouble
and might be led into error (I trust Patrik can come up with a
less inappropriate phrasing while making the point).   But let's
not try to add more tutorial materials or keep trying to
fine-tune: let's get the document out and move on.  

Thanks,
   john

[1] See TUS version 13.0, Section 3.5, definition D46.

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call