Re: Last Call: <draft-ietf-urnbis-rfc2141bis-urn-20.txt> (Uniform Resource Names (URNs)) to Proposed Standard

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




--On Tuesday, February 21, 2017 16:04 +0000 "tom p."
<daedulus@xxxxxxxxxxxxx> wrote:

> OK on 'Updates'
> 
> Meanwhile,
> s.2.2 'the basic Latin repertoire [RFC20] '
> I don't know what you mean - RFC20 does not use the term 'Latin
> repertoire' and, being European, I tend to think of the Latin
> repertoire as being the 160 or so characters of the Belgian
> language.

Mea cupla.  I knew that unqualified reference was going to get
us into trouble and let it go.  Whether you (and future readers
of 2141bis) should have known this (European or not) is
debatable,  but "Basic Latin" is a term of art in
internationalization work and "repertoire" keeps us away from
debates about particular coded character sets, encodings, etc.,
and, in particular, avoids confusion with %-encoded arbitrary
Unicode characters (see the introduction to Section 2) which are
certainly still ASCII characters.  The definition of "basic
Latin repertoire"  is more or less equivalent to "undecorated
Latin characters", but that terminology raises other issues...
to the point that I'm tempted to say "can't win no matter what
one does".    

Personally, I've always objected to "Basic Latin" because
several characters we consider to be part of the repertoire
today (e.g., "w" and distinct "j" and "u") were late additions
and would not have been part of the writing system as distinct
characters for writers of Latin in the time or, e.g., Virgil.
But that argument was lost long ago and "basic Latin" is the
prevailing terminology today.

> I think you need to specify the code points here.

I think that would cause confusion with syntax rules that
specify what is possible.   The paragraph in which the phrase
you picked up simply imposes a "SHOULD NOT" restriction and
explains why it should be adhered to when possible and what the
exceptions should be. It really clarifies and is perhaps
partially redundant with the previous paragraph.  How would you
feel about making the phrase something closer to "the basic
Latin repertoire, i.e., the letters and digits of ASCII as
described above" and moving the RFC 20 citation to the first use
of "ASCII" in that previous paragraph?

Other textual suggestions welcome but, again, I think listing
code points would cause confusion with the formal syntax for
<namestring> and some of the carefully-constructed language
around it.

     john

p.s. personal note: 2141bis has either the advantage of, or
suffers from, the fact that Peter and I have been immersed in
internationalization (i18n) issues for the last several years
with various involvements in PEECIS, IDNA, SMTPUTF8 ("EAI"), and
other efforts.   That has probably resulted in our trying to be
a little more precise in this draft than many IETF documents
about URI schemes and naming and perhaps, as in this case,
having slightly higher expectations about community
understanding of those issues than might be the case.  I think
that precision is an advantage and hope that community knowledge
and understanding will gradually improve.   YMMD, but I hope
than anyone who is actually inclined to advocate for less
precision (you have not) be careful about any advocacy, even on
a diversity basis, for IETF consideration of writing systems
that cannot be expressed in ASCII and/or any language other than
English.
 




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]