--On Monday, 03 January, 2005 12:29 -0800 Peter Constable <petercon@xxxxxxxxxxxxx> wrote: >> From: John C Klensin [mailto:john-ietf@xxxxxxx] > >> Ignoring whether "that very nearly happened in RFC 3066", >> because some of us would have taken exception to inserting a >> script mechanism then, let's assume that 3066 can be >> characterized as a language-locale standard (with some funny >> exceptions and edge cases) and that the new proposal could >> similarly be characterized as a language-locale-script >> standard > > I can see we might run into some terminological hurdles here. > I would decidedly *not* describe RFC 3066 as a "locale" > standard just because it allows for tags that include country > identifiers. I would strongly contend that a "language" tag > and a "locale" ID are different things serving quite different > purposes. But I'll read the rest of your comments assuming > that by "language-locale(-script) standard" you simply mean a > standard for language tags that can include subtags for region > and script. That is more than close enough for discussion purposes. >> If one makes that assumption, then >> the (or a) framework for the answer to the question of what >> problem this solves that 3066 does not becomes clear: it meets >> the needs of when a language-locale-script specification is >> needed. >> >> But that takes us immediately to the comments Ned and I seem >> to be making, characterized especially by Ned's "sweet spot" >> remark. It has not been demonstrated that Internet >> interoperability generally, and the settings in which 3066 are >> now used in particular, require a language-local-script set of >> distinctions. > > I disagree. There are many cases in which script distinctions > in language tags have been recognized as being needed; several > such tags have been registered for that reason already under > the terms of RFC 3066, and there are more that would already > have been registered except for the fact that people have been > anticipating acceptance of this proposed revision. (For > instance, in response to recent discussions, a representative > of Reuters has indicated that he was holding off registering > various language tags that include ISO 15924 script IDs on > that basis, and that he plans to do so if this proposed > revision is delayed much longer.) It would be very helpful, to me at least, if you or he could identify the specific context in which such tags would be used and are required. The examples should ideally be of IETF-standard software, not proprietary products. >> The document does not address that issue. > > That is probably because those of us who have been > participants of the IETF-language list, where this draft > originated, have become so familiar with the need that it > seems obvious -- evidently, it's not as obvious to people that > have not been as focused on IT-globalization issues as we have. How nice. In 2004, I discovered that I had no operational experience and then that I knew nothing about standardization processes outside the IETF. It is now only three days into 2005 and already I've learned that I haven't been focused on "IT globalization". I anxiously await the opportunity to find out what comes next in this sequence :-) >> Equally important, but just as one example, in the MIME >> context (just one use of 3066, but a significant one), we've >> got a "charset" parameter as well as a "language" one. >> There are some odd new error cases if script is incorporated >> into "language" as an explicit component but is not supported >> in the relevant "charset". On the one hand, the document >> does not address those issues and that is, IMO, a problem. >> But, on the other, no matter how they are addressed, the >> level of complexity goes up significantly. > > I don't see how such error cases are significantly different > from current possibilities, such as having a language tag of > "hi" and a charset of ISO 8859-1 (where the content is > actually uses some non-standard encoding for Devanagari). Since I haven't paid attention to IT globalization and internationalization issues for the last 20 or 30 years, I obviously don't know enough about alphabetic equivalency relationships, the collection of TC 46 transliteration standards (including, in this case, the possibility that IS 15919 is in use), and related work to be able to address this question. >> One can also raise questions as to whether, if script >> specifications are really needed, those should reasonably be >> qualifiers or parameters associated with "charset" or >> "language" (and which one) rather than incorporated into the >> latter. I don't have any idea what the answer to those >> questions ought to be. > > Having worked on these particular issues for several years, I > and many others feel we *do* have an idea what the answer to > those questions ought to be -- that script should be > incorporated as a permitted subtag within a language tag. Good. See request for explanation and examples above. Things that you and your colleagues know, but that aren't in the draft or some supplemental and equally accessible document are really not helpful. >> But they are fairly subtle, the document doesn't address >> them (at least as far as I can tell), and I see no way to get >> to answers to them without a lot more specificity about what >> real internetworking or interoperability problem you are >> trying to solve. > > Some days ago, I made reference to a white paper I wrote a few > years ago that explores the kinds of distinctions that need to > be made in metadata elements declaring linguistic attributes > of information objects. It's long, and there are some details > I'd change, but that may provide a starting point. The people > who have contributed to this draft are all familiar with these > ideas. You can find this paper at > http://www.sil.org/silewp/abstract.asp?ref=2002-003. Granted, > this paper does not go into details regarding specific > implementations. I've just now skimmed parts of this paper. It is very interesting and I look forward to carefully reading the rest of it. We are in agreement about your category model. The only place where there is a difference is whether, for the purposes of the IETF and the actual demands on RFC 3066, something else --and something as complex as I perceive this proposal as being-- is really needed. I can, for the record, believe that this proposal is unnecessary and too complex while also believing that it is possible to make registrations under the rules of 3066 that would make quite a mess of things. We have tag review processes to prevent just that eventuality. I can also believe that 3066 represents a compromise, rather than a perfect solution to the issues you outline in your paper, without believing that translates into either a problem that needs to be solved or a problem that needs to be solved with this particular proposal. I've got a fairly open mind on those subjects; I just believe that the burden of demonstrating that a major change is needed in a system that appears to be working is, and should be, fairly high. >> Similarly, as we know, painfully, from other >> internationalization efforts, the only comparisons that are >> easy involve bit-string identity. Working out, at an >> application level, when two "languages" under the 3066 system >> are close enough that the differences can be ignored for >> practical purposes is quite uncomfortable. Attempting >> similar logic for this new proposal is mind-boggling, >> especially if one begins to contemplate comparison of a >> language-locale specification with a language-script one -- a >> situation that I believe from reading the spec is easily >> possible. > > RFC 3066 makes reference to a fairly simplistic matching > algorithm using the notion of language-range. The proposed > draft would continue to support that same algorithm with an > expectation that implementations of language-range matching as > defined in RFC 3066 would continue to operate using exactly > the same algorithm on new tags permitted by the proposed > revision -- and with generally desirable results. > > There may be implementations that use a more complex approach > to matching involving inspection of the tagged content itself, > or inspecting the particular subtags of a language tag. >... Peter, you are talking, I think, about different applications doing different things given the greater range of options and flexibility that the new specification provides. From my point of view and experience, every time someone says "well, some applications may do something else" or "some implementations may use a more complex approach", what I hear is more potential for ways in which things won't interoperate, more areas in which profiles are needed to assure interoperability, and so on. Whether the interoperability issues show up at a protocol level or to the user as a violation of the law of least astonishment makes little difference: such things make the Internet work less well and should be avoided unless there is a _really_ strong reason for them. What I'm trying to probe here are those reasons. >... Let me also comment on the ISO 3166 issues here, rather than starting another note. For me, there is no question that 3166/MA has made quite a mess of things with a few of their reuse decisions, most notably the recent assignment of CS to Serbia and Montenegro. In the pre-ICANN period, IANA had fairly well considered procedures for dealing with code changes and I have been appalled that ICANN has sometimes felt a need to ignore those precedents in favor of believing that it needs to consider ccTLD changes any time 3166/MA makes a change. But the solution to the problem of various ISO TCs not having an adequate understanding of the impact on the Internet and IT communities (and, in the case of TC46, even the library/information sciences community that are one of their historical main constituencies) is, IMO, to get that message across via liaison statements and, if necessary and appropriate, encouraging national member bodies to cast "no" votes on standards and registration procedures that are insufficiently stable. After the "CS" decision, the statements from the British Library advocating a much longer time-to-reuse and from the IAB suggesting that a century might be adequate were, again, IMO, just the right sort of approach. In particular, I presume that TC 37 has an adequate liaison mechanism in place with TC 46 to insist that a much more conservative position be adopted with regard to changes. If TC 37 isn't able or inclined to do that job effectively, I'm not persuaded that shifting the task to the IETF is an appropriate solution or one that is likely to be effective. As I have noted in other contexts, an attitude in the Internet community that extreme stability in external standards is critical is not a new development as evidenced in our continued use of ANSI/X3.4-1968 as the base reference for "US-ASCII", just as our response to some incompatible changes in Unicode between 3.2 and 4.0 has been to freeze some things at 3.2. Our solution has not been to try to create IETF standards to work around the stability issues ISO (or other) standards. Down that path generally lies madness. If it is really necessary --i.e., there are no other practical alternatives and we have the needed expertise-- then I think we should consider it, but that case has, IMO, not yet been made in this case. My apologies but, since the Last Call is closing and there is supposed to be a -09 coming, I don't believe that it is useful to continue this discussion much further until the IESG has made some decisions about what should be done next and told the community about them. john _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf