Hi all,
I read this draft and shared the concerns Arnt brought up so not going to repeat them here.
I see this draft should be an informational draft since it is about recommendation and guidance for registry operators without changing IDNA2008. This draft fits more as standalone informational or RFC5894bis than RFC5891bis. The only changes made to RFC5891 is addressing the errata to discuss the relative label length between U-label and A-label. This draft tightens the language, terminology, and definition used to describe the label length, but it has nothing to do with recommending registry operators on Unicode codepoints selection.
This draft intends to update RFC 5890, 5891 and 5894 but in section 5 it only covers updates for RFC 5890 and 5891, it does not cover where it updates RFC5894. I take that it's section 3.2 of RFC5894 but I see this draft as an expansion/elaboration on registry policy than updating it.
Best,
Joseph
On Mon, Oct 14, 2024 at 7:59 AM Arnt Gulbrandsen <arnt@xxxxxxxxxxxxxxxxxxx> wrote:
I wrote about sections 3 and 4:
> The advice doesn't seem awfully contentious. One can write
> IDNA2008-compliant code without this advice, though, so it's not
> obvious to me that it ought to be included in a document about
> IDNA2008.
If included, then I have qualms about the wording of section 3 and
whether to include the costs of abuse handling in section 4.
Section 3, advice for all TLD registries.
There's a touch of "be afraid, be very afraid" about the wording,
unintentional I'm sure. "That work has not been reviewed by the IETF",
"or for being inherently problematic", etc.
Part of it is the amount of text, rather than the content. Section 3
starts with four wall-of-text paragraphs.
I suggest removing quite a bit. This, for example:
The important example for
the root zone is the ICANN Maximal Starting Repertoire 5 (MSR-5) for
the Development of Label Generation Rules for the Root Zone
[ICANN-MSR5] (or its successor documents).
The root zone is a special case, not terribly important for registries
operating second-level zones. Also, the MSR is produced from the LGRs,
"for" is wrong. Why does this paragraph mention "consult carefully
developed consensus recommendations" without mentioning such things as
the Devanagari LGR by name?
Then later:
That
work has not been reviewed by the IETF and is not part of the set of
IDNA Standards that this document updates. The ICANN work in this
area is ongoing and it, and the context and methods involved, are
described in a separate document [LGR-forward-reference].
The IETF doesn't generally review someone else's work, there's no need
to call that out, all it does is scare some readers. BTW, I hope
nobody's planning to keep this RFC in the editor's queue until
[LGR-forward-reference] materialises ;)
The four paragraphs that don't please me either. What's there isn't
totally wrong, but misunderstands more than an RFC should and the result
looks overcomplicated. "A registry decision to allow only those code
points in the full repertoire of the MSR (plus digits and hyphen)..." is
basically saying that if you start with the MSR, add rules, remove
codepoints and make judgment calls, then you can produce something
similar to one or more of the LGRs from which MSR was produced.
I suggest dropping all of those four paragraphs and replacing them with
one or two sentences about the existence of the language- or
script-specific LGRs and then advice that "registries are advised to
ensure that each single label obeys the rules of a single LGR, and
choose which LGRs to allow in each TLD". Maybe add: "Another option for
Arabic or Cyrillic is to base the rules on RFCs 5565/5992" and/or a
remark that some registries have chosen to use a small set that
accidentally happens to be a subset of the Latin LGR.
Following that there's a paragraph about "registries choosing to make
exceptions". It seems a bit long and convoluted to me. Maybe:
"Registries may choose to allow additional code points beyond those
mentioned in the relevant LGRs, but should so so only if they understand
well why these code points were not included in the relevant LGR." If
you want an example so worriers won't be left worrying, then perhaps
add: "(For example, the rune U+16C1 is not included in any LGR since
each LGR serves a currently used language or script, and no living
community uses runes. U+2006 is not included because ...)" blah blah.
It might be good if section 3 were to start with a single paragraph
describing best current practice, to avoid creating that be-very-afraid
effect with those readers who already worry.
Section 4... what?
Section 4 starts with a long headline about "benefit of the registry
owner". I don't see the point of either the headline or the section. Is
it saying that if you profit from each domain registered, you don't want
to turn down people who want to register shady domains? If so, I guess
it's correct so long as the costs of handling abuse can be kept at zero,
but keeping those costs at or near zero seems unrealistic, so I'm
apprehensive about publishing it as an RFC.
A possibly relevant comment: Each LGR was developed by a community
committee to suit that community's writing. From that follows that if
you want to register a domain that's legible for a particular community,
the relevant LGR is intended to cover your use case. From this follows
that if you want to register something that no LGR covers, then you're
either going to have legibility problems, or you're a bad guy trying to
register an impersonation domain, or finally you're a nerd with a
tibetan, runic or klingon domain (I have a fine runic domain). I do like
it when registries cater to nerds like me, but I also suspect that
beyond-LGR domains cause more than their fair share of support/abuse
issues. Hearsay has it that there's been too little of this to really
estimate costs. I suspect, I don't estimate.
Arnt
--
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx
-- last-call mailing list -- last-call@xxxxxxxx To unsubscribe send an email to last-call-leave@xxxxxxxx