> Date: 2004-12-29 17:45 > From: "Addison Phillips [wM]" <aphillips@xxxxxxxxxxxxxx> > To: ietf-languages@xxxxxxxxxxxxx, ietf@xxxxxxxx > Reply to: aphillips@xxxxxxxxxxxxxx > > Comments below. I must admit that I'm losing the ability to respond to this thread, since it contains direct statements that no response will satisfy the correspondent. I'm fairly certain that it does not. It does state that to date no satisfactory procedural method for handling changes in meanings of codes has been presented which does not itself change the meaning of tags which are currently in use. > The origin of the draft is an individual submission, governed by the various RFCs cited. What's the problem with that? Are individual submissions somehow inappropriate now? Individual submissions are fine for Informational and Experimental RFCs, i.e. RFCs which do not purport to be or to become standards. Individual submissions can be part of the Standards Track with AD-level support. It is possible for an individual submission to become BCP with the same caveat, however because BCPs go into effect as standards without the phased roll-in and implementation experience that characterize the Standards Track, "BCPs require particular care" (RFC 2026). They should possess the characteristics that result from phased roll-in of Standards Track RFCs; design choices resolved, multiple independent interoperable implementations, they should be well-understood and have no known technical omissions. > This draft does not modify the process of the IETF [...] IETF process for use of external standards is to reference those standards as they exist, not to attempt to modify those standards by declaring bits and pieces invalid in the absence of transfer of change control from the originating body. > Do what you feel is warranted, Bruce. You don't appear to be trying to achieve consensus, which is the touchstone of the IETF process as I understand it. If you feel issues should be taken to the IESG, then do so. You have yourself noted that the draft is an individual submission, not the result of an IETF process. "consensus" doesn't apply to an individual effort. IF you want to adhere to IETF process, by all means ask the IESG to set up a working group, with a charter, a Chair, etc.; I fully support that. > > > This draft defines language tags. > > > > Yes. And a registry format technical specification. ÂAnd a matching > > algorithm technical specification. In addition to the registration > > process. > > .... just like RFC 3066 did. RFC 3066 didn't dictate the registry format. The matching algorithm was much simpler -- indeed the complexity of the method in the draft under discussion is primarily due to the addition of orthogonal data as subtags. Note also that in the transition from 1766 to 3066, the specification of the Content-Language field was broken out into a separate document (RFC 3282). > > > Other drafts, RFCs, specs, etc. define processes and > > applications that use them. The appropriate use of language tags > > is the concern of those specifications. > > > > Per RFC 2026, an application having specific requirements for use > > of Technical Specifications (TS) should provide an Applicability > > Statement (AS) specifying specific requirement levels for each > > TS involved... > > The draft provides specific requirements for language tags themselves, which are strings compatible with the RFC 3066 strings already used by the other specifications. The applicability and requirements for this iteration of language tags is the same as it was under RFC 3066. The language tags created do not break existing specifications. The requirements in this document were calibrated to allow all existing RFC 3066 references to remain in force without prejudice. In fact, we did NOT change things that might have otherwise been changed in order to ensure deep compatibility. The point is that an application, such as IDNA, could specify use of tags at a certain requirement level, matching at a different requirement level (or using a different algorithm), and is probably unconcerned with registration procedure and registry format. An applicability statement for use of language tags for IDNA could therefore reference the tag format and matching algorithm(s)' TSs and need not mention the registration procedure or registry format. In short, I am clarifying your earlier statement about uses of technical specifications (viz. that an AS is the mechanism by which appropriate use of TS is documented). > Ultimately, the existance of the RFC 3066 language tag registry trumps all of your arguments about this: all of the tags defined in the generative mechanism of RFC 3066bis could have been registered under 3066 (with loss of functionality for the users of those tags, to be sure). The argument that every complete tag used anywhere is trumped by the existance of the generative mechanism in RFC 3066. Registered variant subtags still must have a recommended range to which they apply. Very little has changed, except that using subtags is a bit more logical. I've reread that several times and can't make sense of it. Could you please rephrase. > > > If there is some text that this draft should carry to help > > guide implementations, please suggest it so that we can all > > consider it. Â > > > > It would help immensely if the 3 technical specifications (tag > > format, registry format, matching algorithm) were separated as > > separate documents to facilitate reference as independent TSs, > > and to facilitate any individual extensions/revisions, etc. > > that may be necessary in the future, and to keep those separate > > from the registration procedure which itself may need to be > > separately referenced and/or revised. > > Well there at last is a suggestion. We think splitting the draft up would not be a benefit because the three items are closely linked and have historically been in one document. There is no indication that any of these items will be separately revised in the future. While I'm sure it is possible, I think it would be wiser to keep these items together, since they have historically been together. So why not then also throw in the closely linked specification of the Content-Language field, which has historically been in the same document (RFC 1766)? I see no substance in your response; it does not address the issue of how an implementation of an application could be facilitated (by making an AS easier to produce by providing separate documents so that requirement levels can be independently and clearly specified for the different TSs). > > > No, the revision clearly expands the scope of language > > distinctions that can be represented with a language tag--quite > > significantly in some cases. > > > > Indeed, and without registration of the tags and the review process > > associated with that (existing RFC 3066) registration procedure. As > > Harald Alvestrand pointed out some time ago, that (inappropriately) > > shifts implementation effort from the tag generator (no registration > > required) to the recipient (what the heck does this mysterious tag > > actually *mean*). > > Nonesense. There is the same review process (strengthened somewhat, actually, from experience) for subtags. RFC 3066 has no review process for subtags. They are what the ISO lists say they are. It does have a review process for IANA registered tags as part of that registration procedure, which (except for private use tags) must be followed before use of a tag not based on ISO language as a primary tag, and optional ISO country as a secondary tag. > Harald's point, I think, is not valid because only the registered (and rarely implemented) tags were subject to scrutiny. Not so; the ISO language and country codes are certainly subject to scrutiny (but not to second-guessing and cherry-picking). Under RFC 3066, a tag may be generated from the standard ISO tag, or it may be an IANA registered tag (leaving aside private use tags for the moment). A parser can easily determine what such a tag is; if the primary subtag has 2 or 3 letters, it is an ISO language code. If the second subtag has 2 letters, it is an ISO 3166 country code. Anything else is either private use (primary subtag is x) or is registered as a complete IANA tag, or is an error. [de-AT-1901, incidentally, (as an example) does not meet the RFC 3066 requirement of 3 to 8 characters in the second subtag for registration with IANA...]. Under the proposed draft, anybody may legally generate a tag such as sr-Latn-CS-gaulish-boont-guoyu-i-enochian or sr-Latn-CS-gaulish-boont-guoyu-i-enochian-x-foo with *no* specific registration requirements (i.e. all components are either registered or require no registration). In the latter case, a parser can only determine that it contains a private-use subtag after wading through the other subtags. In either case, it is difficult (to say the least) for the recipient or his software to determine what the generator of that tag intended to convey. Returning to the private use issue; in RFC 3066, as in every other case that I know of where x is used as an indicator of private use for some name, it is used as a prefix of the name, never buried deep inside the name (as provided for by the draft proposal). > The new draft actually provides a framework in which any subtag's type can be discerned from its position and size, even if the subtag itself is unrecognized: this is actually *better* than you could obtain with the existing registry. Not quite; in the examples above one cannot determine what "enochian" is from its size and position alone -- one needs to know that it follows a single character subtag and that the single character is not an x. > The generator *is* required to register non-private use subtags for use, so that statement mystifies me. You can't just use any subtag you feel like (except as private use). The recipient can access the registry to determine the meaning of any subtag (you couldn't do that before). Surely you're not claiming that each individual generator must separately register "sr", "Latn", "CS" etc. in order to use them!?! A recipient using software that interprets RFC 3066 tags isn't going to be able to do anything useful with any hypothetical tag which contains a script subtag that would be produced under the draft rules (if the script subtag were to appear *after* the region sugtag, one could at least match "sr-CS-Latn"[...] to "sr-CS", which an RFC 3066 parser could handle. Again returning to private-use, an RFC 3066 parser can (only) determine that a private-use tag is in use if it has x as the primary tag. There are provisions in the draft syntax that break backwards compatibility. > > What about core Internet protocols such as MIME and the > > Internet Message format (STD 11)? > > I could have cited those. The example was not intended as an exhaustive list, eh? Are you suggesting that XML isn't an important technology? [...] > So what? We don't like the W3C or something? XML isn't an IETF protocol or format. Whether or not it is "important", for any meaning of that word, is irrelevant. The point is that given the IETF's limited resources, it concentrates on Internet technology (see RFC 3935) and it needs to take (core) Internet protocols into account in IETF specifications such as RFCs (BCP or otherwise). > Well you can't have it both ways. Either CS means Czechoslovakia or it means Serbia and Montenegro. Certainly in language tags "CS" is in use to mean Srbija i Crna Gora-Srpski. I haven't seen any documented cases where it is used (in language tags) to mean Czechoslovakia (but I haven't started any archelogical digs to try to uncover any). If there has been no such use, then the brouhaha over the change is much ado about nothing. If there has been such use, then it's clear that interpretation is going to have to be linked to time of generation of the tag if the semantics are to be preserved. > You can see an early version of draft-09 that attempts to address it here: > > http://www.inter-locale.com/ID/draft-phillips-langtags-09.html > > Your comments on that would be appreciated. For the moment, we're discussing draft-phillips-langtags-08, on which IESG action is pending (in a week). There are many things that the IESG might do when it makes its decision; in prudence, I'll wait to see what they decide. IMO, discussing multiple revisions of a draft through multiple IESG New Last Calls isn't the most efficient or effective way to make progress. > > > We greatly expanded what can be represented in four major ways: > > > > > > 1. Added script subtags for writing system variations. > > > 2. Mixed generative and private use subtags for private minor > > distinctions in tags. > > > 3. Extensions for really specialized distinctions. > > > 4. UN M49 region codes, including supra-national regions to > > represent geographical distinctions not covered by ISO 3166 or by > > instability in same. > > > > It's not entirely clear if some of those items (e.g. script) should > > be expressed by an orthogonal mechanism rather than embedded in a > > *language* tag (for that matter, in retrospect, country codes was > > probably a bad idea). > > There would be no RFC 1766 or 3066 if ISO 639 language codes actually captured all of the nuances of language (doh!). Well, there was a need for separate registered tags and for specification of private use tags, so I don't think that's quite right. It sounds like 639-3 might provide substantially greater coverage. > There is a clear need for script codes for distinguishing certain kinds of Chinese written material, as well as certain languages in which there are active script transitions or in which the language is commonly written in more than one script. Individuals not connected with this effort have attempted to register similar language tags recently. It is important to identify the writing system in those cases to many users. But none of that applies to an audio file of spoken material, where script would be superfluous and, as noted above, would lead to loss of backwards compatibility. Surely some types of script is indicated by the charset; in situations where that is not the case, a separate mechanism could be used for that orthogonal parameter without breaking compatibility with existing parsers of language tags. > > The whole "stability" brouhaha seems to be a tempest in a teapot. > > Surely the issue could be addressed in a professional manner by > > reaching an agreement with ISO/UN regarding the issue, as has been > > done for the case of 2-letter vs. 3-letter codes and stability of > > existing 3-letter codes. > > It is only *one* of the things addressed by the draft. But it is and remains important. Doug Ewell suggested to me that even if no RA or MA ever reuses a code again, it is still ISO 3166/MA's job is to keep the codes in sync with the current state of the world. ÂWhenever countries split up, join together, or change names, ISO 3166/MA will be there to change the code list. ÂThe instability is not all the MA's fault, but we still need to protect against it because of legacy data. The lonely CS example should not become the state of affairs going forwards. Does the ISO not set ground rules for the 3166/MA? Could it not specify that codes are not to be reused? > Matching hasn't actually changed. I beg to differ. Introduction of a script subtag between language and country code changes matters considerably, in a manner which breaks backwards compatibility. > The existance of multiple mechanisms isn't really an issue. The draft specifies ONE mechanism, just like RFC 3066, and notes that more specialized processing is possible. It's an issue that calls for a separate specification to facilitate reference (by an AS) to the mechanism or mechanisms which are applicable, at their respective requirement levels, without confusion about what specification is being referenced. > > > If one specifies "en-FR", then one should not expect to receive > > anything less specific than "en-FR". > > > > Are you referring to use in Accept-Language fields or in Content- > > Language fields (or equivalent accept/send dichotomy)? > > Yes and no. Accept/Content is one example of matching. Another might be a query on a document (as with XQuery on an XML document, for example). The remove-from-right matching rules in RFC 3066 (and the draft) have long had this particular design. > > > > > In software resources generally one specifies the *most > > specific* (granular) tag that one will accept and may receive > > less specific content (which may include the default content). > > > > Indeed; hence the question above. [I also note in passing that > > IETF deals with the Internet in particular, not with "software > > resources generally".] > > So? Do you not see the contradiction between "one should not expect to receive anything less specific" vs. "may receive less specific content"? > Are you not aware of things like message catalogs, resource bundles, and the like? I'm aware of many things. But as noted, the IETF has limited resources, and concentrates on Internet issues; it does not have delusions of being able to solve all of the world's problems. > > > In language tag matching one specifies the *least specific* tag > > that one will accept and won't receive anything less specific > > (although you might receive something more specific). > > > > I'm not sure; if one indicates acceptance of Franglais (en-FR), > > receiving plain en is probably acceptable. ÂReceipt of en-FR-<Brittany> > > for whatever mechanism is used to indicate the variant of English > > spoken in the region of Brittany (where Breton is a Gaelic language, > > rather than one derived from Latin, like French, or of Germanic root, > > like English) in the country of France, might well be incomprehensible > > to an English-speaking Frenchman from Alsace. [Let's not confuse the > > specific example with the general principle which it illustrates.] > > That's the small point I'm illustrating. But in response to JFC, you specifically said that "one should not expect to receive anything less specific". It seems to me that receipt of less specific (i.e. more general) is OK. > Your example of Breton is a bad choice of tags, though. Breton has its own ISO 639 code ("bre"). But the tag refers to a dialect of English spoken (as a second language) by a Breton, not to the Breton language per se (and in a cursory look, I didn't see a UN M49 region code for Brittany). > I doubt that en-US-boont is fully intelligible to anyone from more than a few miles outside Boonville without a dictionary. Fine, but that isn't representative of the situation that JFC posed. The representative question would be "does a resident of Boonville, who speaks en-US-boont, understand en-US?". > > > Changing the sources for existing subtags or the interpretation > > of any particular existing language tag is not permitted if we > > are to maintain backwards compatibility. > > > > Agreed that there would be a backwards compatibility problem with > > changing the source. ÂWhich is why there is an issue with "CS" being > > defined in the ISO lists by reference as is currently the case with > > RFC 3066, vs. the proposal to change the source to a separate IANA > > registry which handles "CS" specially (i.e. differently from many > > other ISO-derived codes). > > Yawn. Please see RFC 2026 sections 7.1, 7.1.1, 7.1.3, and 10.1. Note that RFC 3066 strictly complies with those sections, while the draft under discussion, by cherry-picking from ISO lists for which change control has not been transferred to the IESG, does not. > > > To be perfectly blunt: we've worked over a year on this > > project. If you have specific comments on this draft, with > > suggestions for improvements, please send those to the list so > > that they can be viewed by the community and so that Mark and I > > can address them. Your suggestions for additional changes to the > > syntax of language tags we find to be incompatible (to the extent > > that we understand them) with RFC 3066 and our own work on > > draft-langtags. You will note that draft-langtags can accommodate > > your requirements using the mechanisms spelled out above and in > > the draft... so I fail to see what we should change. If you can > > express that, we'll consider it. Otherwise you are free to do as > > we did and write your own draft. Internet-Drafts are a volunteer > > effort and do not write themselves. Neither is there a Star > > Chamber of people who create them in the dead of night. If you > > see a need, fill it. I would suggest: wait for draft-langtags to > > be an RFC and write an extension that does what you want. > > > > See RFC 2418; specifically section 2.3 and the comment about consensus > > about a wrong design. ÂSee also the RFC 2026 process requirements and > > RFC 2418 procedures; a group which has no charter or equivalent > > document, no written record of meetings, etc. might very well be > > described as "a Star Chamber of people". > > There is a list archive. You can see the discussion and the drafts (I maintain all of them online). That addresses only one of the issues. It does not address the issue of a charter, of conflict resolution procedures, minutes of face-to- face meetings, etc. (and the list was established for a purpose other than work on an RFC). > Discouraging people from participating in the IETF process is, I think, odious. Agreed. But the activity on the ietf-languages list regarding the draft under discussion isn't an IETF process -- there is no WG or Chair, no charter, etc. Like the fictional Topsy, it jes' growed up. > The current draft REPLACES RFC 3066. Drafts don't replace RFCs. _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf