RE: Last Call on Language Tags (RE: draft-phillips-langtags-08)

John C Klensin <john-ietf@xxxxxxx> · Mon, 03 Jan 2005 14:37:17 -0500

--On Monday, 03 January, 2005 09:58 -0800 Peter Constable
<petercon@xxxxxxxxxxxxx> wrote:

>> From: John C Klensin <john-ietf@xxxxxxx>
> 
>> 	(iii) One way to read this document, and 3066 itself for
>> 	that matter, is that they constitute a critique of IS
>> 	639 in terms of its adequacy for Internet use.
> 
> Not exactly. It reflects that ISO 639 alone does not support
> all of the linguistically-related distinctions that need to be
> declared about content on the Internet -- something that ISO
> 639 itself acknowledges (in general, not just in relation to
> the Internet). 
>...
> Thus, I would not describe this as a critique of ISO 639. It
> is simply a recognition that ISO 639 itself makes that there
> are language distinctions that often need to be made that ISO
> 639 itself does not make.

Peter,

What I said was "critique of ISO 639 in terms of its adequacy
for Internet use" and not "general critique of ISO 639".  I
think, despite differences in choice of language, your note says
much the same thing.   So, unless I profoundly misunderstand
your note, we are in agreement on that subject.

But let me, reluctantly, move on to substance at a slightly
higher level of abstraction than has characterized most of the
discussion so far.   The reluctance is due to the statement that
there was going to be another revision.  We normally don't do
that in the IETF: Last Calls are supposed to be about documents
that are proposed for publication and, IMO, the IESG should have
terminated the Last Call the moment the statement was made that
a revision to address some of Bruce's comments was in progress.

You observe that...

> Just as RFC 1766/3066 also use ISO 3166 country codes to make
> sub-language distinctions (e.g. to distinguish vocabulary or
> spelling), so also there is a need to use ISO 15924 to
> distinguish between different written forms of a given
> language. The proposed draft incorporates ISO 15924 --
> something that very nearly happened in RFC 3066, but did not
> since ISO 15924 was still in process and (as I see it) those
> of us involved needed more time to evaluate the idea (which has
> happened in the years since then, to the point that we have
> confindence about this step).

Ignoring whether "that very nearly happened in RFC 3066",
because some of us would have taken exception to inserting a
script mechanism then, let's assume that 3066 can be
characterized as a language-locale standard (with some funny
exceptions and edge cases) and that the new proposal could
similarly be characterized as a language-locale-script standard
(and let's mostly ignore the question of whether there are funny
exceptions and edge cases).  If one makes that assumption, then
the (or a) framework for the answer to the question of what
problem this solves that 3066 does not becomes clear: it meets
the needs of when a language-locale-script specification is
needed.

But that takes us immediately to the comments Ned and I seem to
be making, characterized especially by Ned's "sweet spot"
remark.  It has not been demonstrated that Internet
interoperability generally, and the settings in which 3066 are
now used in particular, require a language-local-script set of
distinctions.   The document does not address that issue.
Equally important, but just as one example, in the MIME context
(just one use of 3066, but a significant one), we've got a
"charset" parameter as well as a "language" one.   There are
some odd new error cases if script is incorporated into
"language" as an explicit component but is not supported in the
relevant "charset".  On the one hand, the document does not
address those issues and that is, IMO, a problem.  But, on the
other, no matter how they are addressed, the level of complexity
goes up significantly.  

One can also raise questions as to whether, if script
specifications are really needed, those should reasonably be
qualifiers or parameters associated with "charset" or "language"
(and which one) rather than incorporated into the latter.  I
don't have any idea what the answer to those questions ought to
be.  But they are fairly subtle, the document doesn't address
them (at least as far as I can tell), and I see no way to get to
answers to them without a lot more specificity about what real
internetworking or interoperability problem you are trying to
solve.

Similarly, as we know, painfully, from other
internationalization efforts, the only comparisons that are easy
involve bit-string identity.  Working out, at an application
level, when two "languages" under the 3066 system are close
enough that the differences can be ignored for practical
purposes is quite uncomfortable.   Attempting similar logic for
this new proposal is mind-boggling, especially if one begins to
contemplate comparison of a language-locale specification with a
language-script one -- a situation that I believe from reading
the spec is easily possible.  That situation almost invites
profiling of how this specification should be used in different
circumstances, and I don't think we want to go there unless
there is no alternative.   Better two different
language-identification specifications for different,
clearly-delimited, purposes (which was, more or less, one of my
alternatives options).

The academic and theoretician in me really likes this system.
It is elegant and comprehensive in ways that 3066 is not.  But I
try to keep my focus around IETF fairly pragmatic.  From a
pragmatic standpoint, it remains unclear what problem is being
solved here and hence whether that problem is important enough
to justify either the incompatibility and transition problems
the proposal would cause or the potential for greater
complexity, and especially false negatives and positives on
"close enough" comparisons, that comes with it.   So my
conclusion, at least so far, is that the ability to specify a
system at this level of precision does not imply that it is
desirable to do so as a replacement for 3066, when 3066 seems to
mostly be serving its intended purposes.

regards,
    john

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf