Re: New Last Call: 'Tags for Identifying Languages' to BCP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bruce Lilly has posted comments on the IETF list in response to the last-call announcement for a proposed revision to RFC 3066. His comments were generally negative, raising a number of concerns. I and others involved in preparation of the revision have discussed Bruce’s concerns with him, but they were not made available on the IETF list since those of us other than Bruce were not subscribed to this list. I wish to briefly summarize the outcome of that discussion for the benefit of people here.

 

Some of Bruce’s comments were purely editorial (e.g. formatting of draft); I will not review those.

 

Bruce’s substantive concerns were:

 

-         Accessibility of source ISO standards was referred to in the announcment as a major reason for the proposed revision, but accessibility has not been a problem in his experience.

 

-         RFC directed users to source ISO standards; the proposed revision would establish a registry that includes all ISO identifiers considered valid for use in language tags, but the documentation for those identifiers in this registry does not include both English and French language / country names.

 

-         The proposed revision makes referene to ISO 8601 time/date format being used in the registry, which is a complex and not-readily-available specification.

 

-         The ABNF used in the proposed draft permits many strings that do not conform with RFC 3066.

 

-         The proposed revision imposes no bounds on the length of tags (same as RFC 3066), and does not require registration of complete tags (different from RFC 3066).

 

-         The lack of an "Internationalization considerations" section as recommended by RFC 2277 (a.k.a. BCP 18).

 

As a result of Bruce’s comments, those of us contributing to the development of this revision have suggested certain revisions to which the authors have indicated openness. As I will explain, these revisions would provide clarification on various matters, but would not constitute technical changes in the draft.

 

1. Re accessibility: it was pointed out that the draft itself does not identify accessibility of source ISO standards as one of the primary reasons for the revision. There are some minor accessibility concerns having to do with uncertainty of the on-going availability to the relevant ISO code tables, and to change histories for each of the relevant ISO standards. The proposed changes to the language-tag registry address these concerns, though there were bigger reasons for the proposed registry changes, particularly having to do with stability.

 

 

2. Re the lack of French descriptions in the registry: it was pointed out that the registry defined by RFC 3066 did not include French descriptions, and that the revised registry is not intended to replace the source ISO standards or make them irrelevant. The meaning of IDs would still be established from the ISO standards from which they were drawn, and the proposed revision would continue to make reference to them. As a result of Bruce’s comments, it was suggested that wording be revised in the draft to make this relationship clearer.

 

 

3. Re ISO 8601 time/date format: What is used in the registry is dates expressed in the format “YYYY-MM-DD”. It was agreed that it would be better to identify the format precisely rather than make the generic reference to ISO 8601.

 

 

4. Re the less restrictive ABNF: the one place that had less restrictive syntax was a production rule that was subject to additional strict constraints, namely that only certain pre-existing tags registered under RFC 3066 could fall under that production. A change to the ABNF has been suggested that would make the ABNF at that point consistent with the ABNF for RFC 3066. This does not constitute a change having any technical consequence as there is no resulting change in the set of valid tags.

 

 

5. Re upper bounds on length of tags: It was pointed out that private-use tags for both RFC 3066 and the proposed revision have no bounds on their length. The greater concern was for non-private-use tags. For these, it was pointed out that RFC 3066 also imposes no bounds on length. Admittedly, though, there is a difference because RFC 3066 requires registration of complete tags, so one can determine at any time what is the longest valid tag that may be encountered, whereas the proposed revision requires registration of sub-tags which can then be combined productively, and one cannot predict with certainty what combinations may be used. (This, IMO, is the most significant of the concerns Bruce raised.)

 

While the proposed revision allows productive combinations of registered sub-tags, there are some limits on how combinations can be made, as specified by the ABNF. The ABNF does allow unlimited numbers of certain elements – specifically three.

 

One of these (‘extlang’) is defined by the ABNF in anticipation of possible future extension of the language tag specification to incorporate mechanisms expected in a new part to ISO 639 that is in preparation, but is not made avaialble for use at this time.

 

Another (‘variant’) requires sub-tags to be registered, and requires that the registration indicate prefix sub-tags that they are recommended to be used with. While it may still be technical valid to use a registered variant in some way other than the recommendatation, that will be unlikely (just as certain combinations valid under RFC 3066, such as ja-DE are unlikely). Thus, implementers will have a reasonable chance of anticipating what combinations will be used.

 

The third of these (‘extension’) is defined as mechanism for extending language tags for use in future protocols. There is an upper limit of 25 extensions, though this RFC does not define limits on the length of each extension. There are no extensions defined at this time, and any extension would require specification in the form of a separate RFC. At such time as one or more extension RFCs are defined, those specifications would provide some indication of what limits they do or don’t impose on the length of extensions. In the case of any protocol that supports this proposed revision to RFC 3066 but does not support extensions, any extensions that may be included in a language tag are ignorable.

 

Apart from extensions, all of the mechanisms introduced in the proposed revision were in response to the direction users and implementers were already going with registered tags under RFC 3066. Thus, while the proposed revision gives greater provision for lengthy tags, this is not completely unrestrained, and the practical likelihood of encountering tags of any given length would be no greater under the proposed revision than it was under RFC 3066.

 

Even so, verious changes were suggested to highlight issues related to length, specifically with a view to the possibility that some applications of RFC 3066 (or this proposed revision) would impose fixed limits on the length of tags. These suggestions included notes in that regard in key points within the RFC, but also in sub-tag registrations and in RFC defining extensions. (For instance, a variant registration would include not only a recommendation on appropriate prefixes, but also specific comments on maximal length of tags using the given variant.) There were no suggestions to impose limits on the length of tags in the RFC itself (just as RFC 3066 does not impose limits). Basically, limits on length was seen to be a concern belonging to particular applications of the language-tag spec and not the spec itself, but significant additions would be added to the RFC so that these concerns are highlighted.

 

 

6. Re an i18n-considerations section: It was pointed out that language tags are symbolic identifiers with no culture-specific content; the only i18n consideration related to the identifiers themselves are charset, and charset issues are covered in the section on syntax. Bruce was also concerned about i18n considerations in the registry (see issue #2, above – lack of French-language descriptions), but it was pointed out that the content of the registry is not intended as localization data, that there are well-established precedents for code sets that are not documented in terms of multilingual content, and therefore that it was not really necessary to discuss i18n concerns in relation to the registry (no more than it is necessary to have a section to discuss i18n issues in relation to the IANA charset registry in RFC 2978).

 

 

In conclusion, I think that some of Bruce’s concerns were valid, and suggestions for changes have been presented to the authors accordingly. I believe all of these changes can be considered to be for clarification purposes, rather than technical changes. (No changes affecting the set of valid tags have been made.)

 

 

 

Thanks.

 

Peter Constable

GIFT | GPTS | MICROSOFT

 

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]