There is a fundamental misunderstanding on two points. 1. Of course countries go in and out of existence, and change their borders; nobody disputes that. That is not the stability problem in question; it is where the meaning of tags changes so drastically as to refer to a completely different country. One can't willy-nilly change data that has significant effects on databases all over the world; when someone's birthplace is indicated by a stored country code, for example, it mustn't suddenly designate a different country! For more, see http://www.unicode.org/consortium/positions.html. 2. The fact that the 3066 registry is not in multiple languages (either currently or in the new draft) has nothing to do with any alleged discouragement of any language, French included. The names in the registry are simply to distinguish and identify the subtags, not to provide recommended localizations. The registry, and for that matter the ISO 639/3166 standards, are the wrong place for localization data. The language coverage (only 2!) is a very small fraction of what is really needed for any real product development -- and even for those languages that are present, the names used there are not optimal for user interfaces since they are sometimes not the customary form. For an example of a data repository that is designed for localization of language/region names, see http://www.unicode.org/cldr/. âMark ----- Original Message ----- From: "Bruce Lilly" <blilly@xxxxxxxxx> To: <ietf-languages@xxxxxxxxxxxxx> Cc: <ietf@xxxxxxxx> Sent: Sunday, December 12, 2004 08:46 Subject: Re: New Last Call: 'Tags for Identifying Languages' to BCP > Date: 2004-12-10 22:37 > From: John Cowan <jcowan@xxxxxxxxxxxxxxxxx> > > Bruce Lilly scripsit: > > > It's not clear to me that the proposal will provide protection > > against the whims of politicians. If the definition of "CS" as > > a country code changes again under the proposed scheme, > > how is one to determine specifically what some archived > > language-tag referred to at some point in time? I'm not > > particularly concerned about that problem, as I am resigned > > to instability associated with anything specified by politicians > > (and that includes the UN region codes). > > The U.N. Statistics Division are only "politicians" in the sense > that IETF WG members are. They are, in fact, statisticians. > Their track record for stability is considerably longer than the > IETF's. I hope that I need not repeat any of the well-known remarks about "statistics". Nor that I need point to the many uses by politicians of statistics (and statisticians) for political purposes. Moreover, the point is that countries do change, and that use of country codes (as provided for in RFC 3066 and in the proposed draft) carries with it the inherent instability which is characteristic of politics. A quest for "stability" of countries seems Quixotic and oxymoronic. According to the principle of stability as that term is used in defense of the draft, I suppose we're all intended to refer to Malawi as "Rhodesia" because that's what it (in part) was called 50 years ago, or that we're supposed to ignore the breakup of the USSR, Yugoslavia, etc., the reunification of Germany, etc. A related problem with the use of country codes in language tags is that there is not necessarily an inherent relationship between language and country borders. The borders of Germany have changed many, many times. If one is referring to the German language as spoken by inhabitants of Alsace, using country codes would imply that that same language spoken by the same people would have been tagged at various times as de-DE and de-FR according to where the France-Germany border happened to have been determined by politicians of the time. That strikes me as being a rather silly way to tag language, but that's the precedent set by RFC 1766. As far as I can tell, the draft doesn't really deal with the issue of changing borders or changing country names -- it merely pretends that these things don't happen by attempting to declare a snapshot of the status at some point in time as being valid for all time. > > But if the proposed new registry's description of "CS" says > > "foo" and the ISO standard code list says "bar", what's > > an implementor supposed to present to a user as *the* > > description associated with "CS"? > > The former. That's the whole point of having a registry. But the user has indicated that he speaks French, and the proposed registry contains a description in English only. Where is the implementor supposed to get the *official* translation for display? N.B. under the current (RFC 3066) situation, the definitive ISO lists provide an official description in French. > > One possibility would be two description fields. > > Why two? There are now two in the ISO lists (and, as noted, in the UN list). I have no objection to more, but I object to a reduction. The text accompanying the new last call states: "This specification addresses each of these issues with a simple, elegant design that is compatible with existing language tags and implementations." and "One concern that is crucial to acceptance of the new language tag design is how it works with existing implementations of RFC 3066 and how existing implementations will interact with implementations of the newer language tags." and "It is important to recognize that all language tags that were valid under the existing RFC 3066 will remain valid, with their meanings intact, under this specification." I have an implementation which (in accordance with RFC 3066) uses the official ISO lists. It has provision for displaying ISO 639 language tags with their descriptions in either of the two languages supported by the official 639 lists, and likewise for the ISO 3166 country codes. The specification of the draft is *NOT* compatible with that existing implementation because it removes the existing functionality of official descriptions in French of language and country codes. As a result of that incompatibility, the newly proposed specification does not work with (at least that one) existing implementation (but I agree that that is a crucial concern). Language tags remaining valid, I presume that the tag "sr-CS" will continue to mean Serbian as used in Serbia and Montenegro (officially equivalent to Serbe par Serbie et MontÃnÃgro) as that is a valid RFC 3066 language tag and its corresponding meaning... but I can see no evidence of that in the draft -- indeed it appears that the draft would change that meaning significantly. > There are 6000 languages spoken on Earth, of which > perhaps 600 have a standard written form. ISO 639 lists about 650, not precisely 6000. It might be worthwhile considering the differences in the way languages tags are used, by whom they are used, and for what purpose. There may well be a substantial difference between use of a tag to represent an obscure dialect of a dead language in a research paper vs. tagging a piece of text in one of the core Internet protocols such as SMTP. The draft seems to ignore the needs of the core Internet protocols (e.g. unbounded tag length which is incompatible with those protocols). > What is supposed to > be privileged about English and French? They happen to be the languages in which international standards (q.v. the ISO and UN lists) are published. If one is going to use those standards (or a snapshot of them) as a basis for subtags, then one ought to preserve the standardized descriptions in the offcial languages of those standards rather than discarding all but one of them in a fit of Anglo-centrism. > > Eliminating bilingual descriptions for the language, > > country (and UN region) codes leaves implementors > > in a quandary. > > Only for those implementers to whom English and French, but > no other language, is essential. Implementors of RFC 3066, where the relevant standards provided official bilingual descriptions of the country and language codes. Something which the "new last call" text states that the draft proposal "is compatible with", but which is not evident in the substance of the proposal. > > ABNF from the draft: > > You're technically right, but your underlying claim (that RFC 3066 tags are > bounded in length) is false, as has been shown One part of my claim is that non-private-use RFC 3066 tags up to the present time are no longer than 11 octets in length. As the draft, if/when approved, would close that registration process, that limit (unless a longer tag is registered in the interim) would apply for all time. The other part of my claim is that under the proposed scheme, non-private-use tags become unbounded in length, and that is incompatible with existing Standards Track RFCs (821, 822, 2821, 2822, 2047, 2231, 3282 among them) and the core Internet protocols which they specify. > and the "grandfathered" > production is only used to match certain existing registered RFC 3066 > tags as they appear in the registry. Then the ABNF for that production should match those "certain existing registered RFC 3066 tags as they appear in the registry" , and not match unbounded-length subtags, non-alphabetic primary subtags, zero-length subtags, dangling hyphens, etc.; I don't want that ABNF to be used as an excuse for a future revision to introduce such constructs officially on the basis that they are permitted by that ABNF. _______________________________________________ Ietf-languages mailing list Ietf-languages@xxxxxxxxxxxxx http://www.alvestrand.no/mailman/listinfo/ietf-languages _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf