> Date: 2004-12-12 19:20 > From: Mark Crispin <mrc@xxxxxxxxxxxxxxxxxx> > To: ietf-languages@xxxxxxxxxxxxx, ietf@xxxxxxxx > > On Sun, 12 Dec 2004, Bruce Lilly wrote: > > If by international agreement, 'yz' becomes the designation > > for that country, then it is rather silly to stick one's > > fingers in one's ears and shout "NA-NA-NA-NA-NA I don't want > > to hear you". > > What is silly is saying that every language tag has to have a date/time > attribute associated with it so that computer software managing that text > knows the language of that text. In the specific cases of the core Internet protocols that I have mentioned, there *is* a date/time attribute in the form of an RFC [2]822 Date field. If we're talking about some file stored on some machine, every OS that I know of has a date/time stamp associated with that file. If you have something else in mind, a concrete description and/ or example might help. > It is a disaster for language identifiers to get recycled. ÂSomething has > to make those identifiers unique. ÂYour notion will force the inclusion of > a date/time stamp in language tags, to restore the uniqueness that you are > so excruciatingly eager to abolish. I'm not "eager to abolish" "uniqueness". There never was any guarantee that codes would never change. Both RFCs 1766 and 3066 specifically mention changes as a fact of life. > > Never > > mind the shortcomings of that particular example; consider > > "de-DE" -- does that mean Germany as it exists today, West > > Germany as it existed 25 years ago, Germany as it existed > > in the 1930s, the 1900s, ...? > > For the 98% case, it does not matter at all. > > But it does matter if, one day, "DE" becomes Denmark. In either case, to understand precisely what geographical area is referred to requires knowing the date to more or less degree of accuracy. > > As far as I can tell, the draft pretends that the meaning > > of "CS" hasn't changed, and would in fact change the meaning > > of the currently valid RFC 3066 language tag "sr-CS". > > No, it restores the previous meaning of sr-CS. But what of the current meaning under the current standard (RFC 3066 + ISO 639 + ISO 3166)? Surely the draft would change the meaning of that valid RFC 3066 language-tag. > > It is very different; under the proposed draft, there is only > > an English definition, somebody wishing to provide a French > > definition finds that he has none and must resort to an > > unofficial translation. > > Why is the situation for French different from someobody wishing to > provide a Lower Slobbobian definition? French is an official language used by the ISO in its publications. "Lower Slobbobian" is probably about as meaningful as "BLURDYBOOP". > > SO where are the French definitions? > > Ask a person who is bilingual in English and French to provide one. That would lack definitiveness which characterizes the ISO lists. > > Well, sure. But the name is an important thing by itself. > > It is rather pointless to ask a user to indicate the > > language of a piece of text by selecting from a list "AB, ACE, > > ACH,..., ZHA, ZUL, ZUN" -- the user doesn't normally refer to > > languages by codes. It's quite a different matter to ask the > > user to select from "Abkhaze, Aceh, Acoli,..., Zhuang (Chuang), > > Zoulou, Zuni". > > Abkhaze, Aceh, Acoli,..., Zhuang (Chuang), Zoulou, and Zuni are not > language tags. ÂSo what's your point? They are the human-readable names corresponding to codes. For interoperability, it is insufficient to label any and all languages as "ZZ" with no definition of what "ZZ" means. Moreover, it is necessary for two (or more) communicating parties to *agree* on the meaning of "ZZ"; that is done by assigning the code "ZZ" to an agreed-upon name. The code "ZZ" is nothing more than shorthand for that agreed-upon name. If one produces some text in the BCP 18 sense of "text" (spoken, written, signed, etc.), it is useful to indicate the language of that text; languages are known to humans by names of languages -- the codes are, as noted, merely shorthand for those names. Likewise, somebody presented with some text may desire or need to know the language of that text; informing that person that the language has code "QZ" is unlikely to mean anything to most people -- only the name corresponding to the shorthand code is likely to be meaningful to persons other than those involved in standardizing the codes. > >> Note that the RFC 3066 specifies a registry that does not include French > >> language names. I suggest that this issue should be dropped. > > Yes, the current IANA registry has that problem for > > the non-ISO-based tags only. If the registry is to be > > changed to subsume ISO codes as well, that defect should > > be remedied. > > Why is it a problem? ÂWhy is it a defect? Because it unnecessarily reduces by 50% the information content currently available. > > On the contrary, it is preposterous to suggest that codes > > will be attached to text by magic > > Here is where you are misled. ÂMany of these tags are embedded within the > text itself. ÂThat text may long outlive its author in an archive. Which is precisely why the code by itself is meaningless without the associated language name. If I write "blurfl (lang=QZ)" in a hypothetical diary, that will be incomprehensible unless the meaning of "QZ" is known. You have not explained how the code came to be "embedded within the text itself" -- surely the author didn't say (or write, or sign) "this text is in language QZ"; most likely the language was indicated by name, or by some proxy representing the name (such as a locale). _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf