[Last-Call] Re: Last Call: <draft-klensin-idna-rfc5891bis-07.txt> (Internationalized Domain Names in Applications (IDNA): Registry Restrictions and Recommendations) to Proposed Standard

Joseph Yee <joseph.yee@xxxxxxxxx> · Tue, 15 Oct 2024 18:34:11 -0400

Hi all,
I read this draft and shared the concerns Arnt brought up so not going to repeat them here.

I see this draft should be an informational draft since it is about recommendation and guidance for registry operators without changing IDNA2008.  This draft fits more as standalone informational or RFC5894bis than RFC5891bis.  The only changes made to RFC5891 is addressing the errata to discuss the relative label length between U-label and A-label.  This draft tightens the language, terminology, and definition used to describe the label length, but it has nothing to do with recommending registry operators on Unicode codepoints selection.

This draft intends to update RFC 5890, 5891 and 5894 but in section 5 it only covers updates for RFC 5890 and 5891, it does not cover where it updates RFC5894.  I take that it's section 3.2 of RFC5894 but I see this draft as an expansion/elaboration on registry policy than updating it.

Best,
Joseph

On Mon, Oct 14, 2024 at 7:59 AM Arnt Gulbrandsen <arnt@xxxxxxxxxxxxxxxxxxx> wrote:
I wrote about sections 3 and 4:

> The advice doesn't seem awfully contentious. One can write

> IDNA2008-compliant code without this advice, though, so it's not

> obvious to me that it ought to be included in a document about

> IDNA2008.

If included, then I have qualms about the wording of section 3 and

whether to include the costs of abuse handling in section 4.

Section 3, advice for all TLD registries.

There's a touch of "be afraid, be very afraid" about the wording,

unintentional I'm sure. "That work has not been reviewed by the IETF",

"or for being inherently problematic", etc.

Part of it is the amount of text, rather than the content. Section 3

starts with four wall-of-text paragraphs.

I suggest removing quite a bit. This, for example:

   The important example for

   the root zone is the ICANN Maximal Starting Repertoire 5 (MSR-5) for

   the Development of Label Generation Rules for the Root Zone

   [ICANN-MSR5] (or its successor documents).

The root zone is a special case, not terribly important for registries

operating second-level zones. Also, the MSR is produced from the LGRs,

"for" is wrong. Why does this paragraph mention "consult carefully

developed consensus recommendations" without mentioning such things as

the Devanagari LGR by name?

Then later:

   That

   work has not been reviewed by the IETF and is not part of the set of

   IDNA Standards that this document updates.  The ICANN work in this

   area is ongoing and it, and the context and methods involved, are

   described in a separate document [LGR-forward-reference].

The IETF doesn't generally review someone else's work, there's no need

to call that out, all it does is scare some readers. BTW, I hope

nobody's planning to keep this RFC in the editor's queue until

[LGR-forward-reference] materialises ;)

The four paragraphs that don't please me either. What's there isn't

totally wrong, but misunderstands more than an RFC should and the result

looks overcomplicated. "A registry decision to allow only those code

points in the full repertoire of the MSR (plus digits and hyphen)..." is

basically saying that if you start with the MSR, add rules, remove

codepoints and make judgment calls, then you can produce something

similar to one or more of the LGRs from which MSR was produced.

I suggest dropping all of those four paragraphs and replacing them with

one or two sentences about the existence of the language- or

script-specific LGRs and then advice that "registries are advised to

ensure that each single label obeys the rules of a single LGR, and

choose which LGRs to allow in each TLD". Maybe add: "Another option for

Arabic or Cyrillic is to base the rules on RFCs 5565/5992" and/or a

remark that some registries have chosen to use a small set that

accidentally happens to be a subset of the Latin LGR.

Following that there's a paragraph about "registries choosing to make

exceptions". It seems a bit long and convoluted to me. Maybe:

"Registries may choose to allow additional code points beyond those

mentioned in the relevant LGRs, but should so so only if they understand

well why these code points were not included in the relevant LGR." If

you want an example so worriers won't be left worrying, then perhaps

add: "(For example, the rune U+16C1 is not included in any LGR since

each LGR serves a currently used language or script, and no living

community uses runes. U+2006 is not included because ...)" blah blah.

It might be good if section 3 were to start with a single paragraph

describing best current practice, to avoid creating that be-very-afraid

effect with those readers who already worry.

Section 4... what?

Section 4 starts with a long headline about "benefit of the registry

owner". I don't see the point of either the headline or the section. Is

it saying that if you profit from each domain registered, you don't want

to turn down people who want to register shady domains? If so, I guess

it's correct so long as the costs of handling abuse can be kept at zero,

but keeping those costs at or near zero seems unrealistic, so I'm

apprehensive about publishing it as an RFC.

A possibly relevant comment: Each LGR was developed by a community

committee to suit that community's writing. From that follows that if

you want to register a domain that's legible for a particular community,

the relevant LGR is intended to cover your use case. From this follows

that if you want to register something that no LGR covers, then you're

either going to have legibility problems, or you're a bad guy trying to

register an impersonation domain, or finally you're a nerd with a

tibetan, runic or klingon domain (I have a fine runic domain). I do like

it when registries cater to nerds like me, but I also suspect that

beyond-LGR domains cause more than their fair share of support/abuse

issues. Hearsay has it that there's been too little of this to really

estimate costs. I suspect, I don't estimate.

Arnt

-- 

last-call mailing list -- last-call@xxxxxxxx

To unsubscribe send an email to last-call-leave@xxxxxxxx

-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx