On Thu December 9 2004 12:23, ietf-announce-request@xxxxxxxx wrote: > New Last Call: 'Tags for Identifying Languages' to BCP > Date: 2004-12-08 17:56 > From: The IESG <iesg-secretary@xxxxxxxx> > To: IETF-Announce <ietf-announce@xxxxxxxx> > Reply to: iesg@xxxxxxxx > > The IESG has been considering > > - 'Tags for Identifying Languages ' > Â Â<draft-phillips-langtags-08.txt> as a BCP > > There have been considerable changes to the document since the > initial last call, and the IESG would like the community to consider > the changes. ÂIn addition, the authors have prepared text describing > why this mechanism is needed as a replacement for the existing > procedure; it is included below. > > The IESG plans to make a decision in the next few weeks, and solicits > final comments on this action. ÂPlease send any comments to the > iesg@xxxxxxxx or ietf@xxxxxxxx mailing lists by 2005-01-05. > > The file can be obtained via > http://www.ietf.org/internet-drafts/draft-phillips-langtags-08.txt I have some comments below. They should not be construed as a complete or thorough critique of the draft; they're initial comments based on a quick review of the draft. One overall comment; I'm surprised to hear that this was already at last call -- some notice to mailing lists which are heavily affected by the proposed changes (e.g. ietf-822) would have been nice... Considering the depth and breadth of the specific issues discussed below, I'm not sure that "surprise" is adequate... > This specification, the proposed successor to RFC 3066, addresses a number of > issues that implementers of language tags have faced in recent years: [...] > * Accessibility of the underlying ISO standards for implementers [...] > There are problems with the the RFC 3066 definition of generative tags, > however. The ISO 639 and ISO 3166 standards are not freely available and evolve > over time. Accessibility has not been a problem for this implementor (who, incidentally, was unaware of this draft until the New Last Call). ISO 639 language code lists are readily available in HTML-ized English and French via http://www.loc.gov/standards/iso639-2/englangn.html and http://www.loc.gov/standards/iso639-2/frenchlangn.html ISO 3166 country code lists are readily available in plain text in English and French via http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1-semic.txt and http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-fr1-semic.txt The ISO registered code lists are freely available at the URIs given above. This implementor has used those URIs for years without difficulty. The ISO standards themselves are not free, but neither are they required for an implementor to identify the valid codes -- the free lists suffice for that purpose. > The largest change in the specification is that it modifies the structure of > the language tag registry. Instead of having to obtain lists of codes from five > separate external standards (not all of which are easily available), the IANA > registry will maintain a comprehensive list of valid subtags that can be used in > the generative mechanism in a machine-parseable text format. Contrary to the implicit claim, the ISO documents mentioned above comprise two standards (available in two languages each), not "five separate external standards". The availability of those two definitive standards in bilingual forms allows implementors to (for example) construct menus of available language and country code tags in BOTH languages used in ISO standards. The draft proposes declaring those standards effectively irrelevant, being replaced by a single monolingual (English) IANA registry. While it has become fashionable in recent years among some factions within the United States to bash France, the French people, their culture, and their language, it seems inappropriate to extend such bashing to technical standards which supposedly apply in an international context. Especially when dealing with the subject matter of language itself. The unavailability of the registered value "description" in 50% of the languages traditionally used for international standards publication, including the existing ISO 639 and 3166 codes, is a serious defect in the proposal, and a departure from the status quo under RFC 3066 (which directly refers to the bilingual ISO standards as definitive). [N.B. I am not accusing the draft authors of French-bashing; it's just that some of us are a bit more sensitive to Anglo-centricity than others. And it remains a fact that the draft has no provision for bilingual descriptions of any subtag fields. (I note in passing that the UN regional codes newly referenced by this draft are available in HTML-ized (ostensibly) English (though I've never seen an A-ring in English text before...) and French).] It is claimed that: > In addition, and very importantly, language tags that are newly > defined by this specification are compatible with the ABNF syntax, matching, > parsing, and other mechanisms defined by RFC 3066. [...] > The design of this > specification was carefully created so that all of the new values that can be > assigned fit the pattern for registered language tags under RFC 3066. [...] > The revision proposed in this > specification addresses the needs of this community of users with a minimal > impact on existing content and implementations, while providing a stable basis > for future development, expansion, and improvement. The ABNF in the draft permits all of the following tags which are not legal per the RFC 3066 ABNF: supercalifragilisticexpialidoceus y----- x1234567890abc a123-xyz Specifically, the draft allows, and RFC 3066 disallows: subtags more than 8 octets in length hyphens which do not separate subtags zero-length subtags primary tags which are not purely alphabetic Curiously, all of those are permitted by the draft ABNF production "grandfathered", which is presumably included to accommodate tags which ARE permitted by RFC 3066, rather than to provide a means for specifying incompatible tags (i have no provision for parsing unlimited-length subtags, zero-length subtags, hyphens not delimiting subtags, or non-alphabetic primary tags, so I know of one implementation which will suffer a major impact from the incompatible syntax change). I see no reason for the ABNF to permit such content as is forbidden by RFC 3066; the actual ABNF for what RFC 3066 permits is contained within 3066, and could have been directly incorporated rather than producing a "grandfathered" production which opens up several cans of worms. One defect related to tag length in RFC 3066 is not remedied by the draft; indeed the problem is greatly exacerbated. One use of language tags is in encoded-words as specified by RFC 2047 as amended by RFC 2231 and errata. The total length of an encoded word, including some syntactic glue, a charset tag, and some text content in addition to a language tag, is strictly limited. Unfortunately, a language- tag's length is unlimited by the ABNF in RFC 3066 (due to an unlimited number of subtags) and in the draft. To date, the problem has been more theoretical than practical due the limited number of subtags typically used. In particular, tags other than private-use tags with more than two subtags require registration under RFC 3066 rules, and it is a trivial matter to determine the longest registered tag. The draft, however, encourages use of more subtags as well as removal of the subtag length upper bound; moreover, it permits infinite numbers of subtags without requiring registration of the resulting complete tag. Consequently it is impossible to establish an upper bound on the length of a language tag which might be encountered -- that affects not only practical implementations, but it negatively impacts protocol design, such as the MIME encoded-word case. > The new registry provides a complete, > easily parseable file which provides the precise the contents of valid tags for > any point in time. That is the first time I have ever heard ISO 8601 date format described as "easily parseable". Perhaps the draft authors meant to say that a specific subset of the tortuously complex ISO 8601 date format is used, but that is not what the draft states. This implementor does not look forward to having to parse all of the various and sundry ISO 8601 variants. [Moreover, while the draft authors have complained on the one hand about unavailability of ISO documents regarding language and country codes (where in fact the code lists needed for implementation are freely available), on the other hand they specifically require adherence to a standard which is not freely available, and which is required in order to be able to parse the proposed revised registry (the existing IANA language-tags registry does not appear to require use of that standard specifically, nor do the ISO code lists). According to the ISO web site, ISO 8601 costs either 108 or 122 Swiss francs.] I am absolutely shocked that a draft dealing with language lacks an "Internationalization considerations" section as recommended by RFC 2277 (a.k.a. BCP 18). Perhaps even more disturbing is the content of the "IANA Considerations" section; the draft predicts that certain things will happen ("IANA will"[...]), but doesn't actually direct (e.g. "IANA shall") IANA to do anything. The placement of that section does not correspond to current RFC-Editor guidelines (it should appear after Security Considerations); also on that point, Appendices should precede References. Many of the references are obsolete (e.g. RFCs 1327, 1521), there is no differentiation between normative and informative references, and at least one reference ([19]) gives a bracketed URI rather than the correctly formatted RFC reference. Although reference is made to the "Accept- Language" header field, RFC 3282 (the defining RFC for that field) is not listed among the references. The formatting of the draft is atrocious, particularly the bizarre "outdenting" (in some cases breaking in the middle of words) near the bottom of page 7, towards the lower part of page 10, the middle of page 13, near the bottom of page 16, towards the bottom of page 19, towards the lower part of page 23, at the bottom of page 29, the second-last text line on page 33, and immediately before References (which incidentally lacks a dot after the section number) (there also appears to be missing some text after the last "bullet"). I am extremely surprised that the draft has been published at least nine times in such a state of poor formatting and poor attention to editorial content (e.g. obsolete and missing references), and that it progressed as far as IESG last call in such a state, with no Internationalization considerations section, etc. I am particularly concerned about the implementation ramifications of the proposed changes, especially (as noted in detail above): 1. the apparent contradiction between the stated objectives w.r.t. accessibility of relevant ISO data and standards and the reality of the proposal's implications (ISO 8601 date format parsing). 2. the clear contradiction between the claims about ABNF compatibility with RFC 3066 and the factual incompatibility of certain provisions in the grammar. Considering the technical importance of those issues, I would request that the IESG consider returning this draft to the authors for further work before reconsidering it for last call -- I'd want to have a chance to thoroughly review the ABNF after the authors have addressed the compatibility issue vs. RFC 3066 before this gets as far as actually obsoleting current BCP (3066). I have copied the ietf and ietf-languages mailing lists in addition to the iesg list as requested; I have set a suggestion for followup to the ietf list. _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf