--On Monday, 03 January, 2005 09:58 -0800 Peter Constable <petercon@xxxxxxxxxxxxx> wrote: >> From: John C Klensin <john-ietf@xxxxxxx> > >> (iii) One way to read this document, and 3066 itself for >> that matter, is that they constitute a critique of IS >> 639 in terms of its adequacy for Internet use. > > Not exactly. It reflects that ISO 639 alone does not support > all of the linguistically-related distinctions that need to be > declared about content on the Internet -- something that ISO > 639 itself acknowledges (in general, not just in relation to > the Internet). >... > Thus, I would not describe this as a critique of ISO 639. It > is simply a recognition that ISO 639 itself makes that there > are language distinctions that often need to be made that ISO > 639 itself does not make. Peter, What I said was "critique of ISO 639 in terms of its adequacy for Internet use" and not "general critique of ISO 639". I think, despite differences in choice of language, your note says much the same thing. So, unless I profoundly misunderstand your note, we are in agreement on that subject. But let me, reluctantly, move on to substance at a slightly higher level of abstraction than has characterized most of the discussion so far. The reluctance is due to the statement that there was going to be another revision. We normally don't do that in the IETF: Last Calls are supposed to be about documents that are proposed for publication and, IMO, the IESG should have terminated the Last Call the moment the statement was made that a revision to address some of Bruce's comments was in progress. You observe that... > Just as RFC 1766/3066 also use ISO 3166 country codes to make > sub-language distinctions (e.g. to distinguish vocabulary or > spelling), so also there is a need to use ISO 15924 to > distinguish between different written forms of a given > language. The proposed draft incorporates ISO 15924 -- > something that very nearly happened in RFC 3066, but did not > since ISO 15924 was still in process and (as I see it) those > of us involved needed more time to evaluate the idea (which has > happened in the years since then, to the point that we have > confindence about this step). Ignoring whether "that very nearly happened in RFC 3066", because some of us would have taken exception to inserting a script mechanism then, let's assume that 3066 can be characterized as a language-locale standard (with some funny exceptions and edge cases) and that the new proposal could similarly be characterized as a language-locale-script standard (and let's mostly ignore the question of whether there are funny exceptions and edge cases). If one makes that assumption, then the (or a) framework for the answer to the question of what problem this solves that 3066 does not becomes clear: it meets the needs of when a language-locale-script specification is needed. But that takes us immediately to the comments Ned and I seem to be making, characterized especially by Ned's "sweet spot" remark. It has not been demonstrated that Internet interoperability generally, and the settings in which 3066 are now used in particular, require a language-local-script set of distinctions. The document does not address that issue. Equally important, but just as one example, in the MIME context (just one use of 3066, but a significant one), we've got a "charset" parameter as well as a "language" one. There are some odd new error cases if script is incorporated into "language" as an explicit component but is not supported in the relevant "charset". On the one hand, the document does not address those issues and that is, IMO, a problem. But, on the other, no matter how they are addressed, the level of complexity goes up significantly. One can also raise questions as to whether, if script specifications are really needed, those should reasonably be qualifiers or parameters associated with "charset" or "language" (and which one) rather than incorporated into the latter. I don't have any idea what the answer to those questions ought to be. But they are fairly subtle, the document doesn't address them (at least as far as I can tell), and I see no way to get to answers to them without a lot more specificity about what real internetworking or interoperability problem you are trying to solve. Similarly, as we know, painfully, from other internationalization efforts, the only comparisons that are easy involve bit-string identity. Working out, at an application level, when two "languages" under the 3066 system are close enough that the differences can be ignored for practical purposes is quite uncomfortable. Attempting similar logic for this new proposal is mind-boggling, especially if one begins to contemplate comparison of a language-locale specification with a language-script one -- a situation that I believe from reading the spec is easily possible. That situation almost invites profiling of how this specification should be used in different circumstances, and I don't think we want to go there unless there is no alternative. Better two different language-identification specifications for different, clearly-delimited, purposes (which was, more or less, one of my alternatives options). The academic and theoretician in me really likes this system. It is elegant and comprehensive in ways that 3066 is not. But I try to keep my focus around IETF fairly pragmatic. From a pragmatic standpoint, it remains unclear what problem is being solved here and hence whether that problem is important enough to justify either the incompatibility and transition problems the proposal would cause or the potential for greater complexity, and especially false negatives and positives on "close enough" comparisons, that comes with it. So my conclusion, at least so far, is that the ability to specify a system at this level of precision does not imply that it is desirable to do so as a replacement for 3066, when 3066 seems to mostly be serving its intended purposes. regards, john _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf