> From: ietf-languages-bounces@xxxxxxxxxxxxx [mailto:ietf-languages- > bounces@xxxxxxxxxxxxx] On Behalf Of Bruce Lilly > > As mentioned, the limit is imposed by other tight constraints on > 'grandfathered'; you have already identified that the longest registered > tag under RFC 3066 is 11 octets in length, therefore a 'grandfathered' tag > can be at most 11 octets in length. > > But the constraints probably aren't as tight as you > believe; the draft specifically permits a future > revision to allow a primary subtag longer than > 8 octets, or not purely alphabetic, etc. RFC 3066 does not impose any restrictions on what its replacements might do. This is the case with any specification: a given technical specification is not a specification of human behaviour and cannot keep us from revising the spec or replacing it in any way we may choose. > One would hope that under RFC 3066 rules, that the > reviewer, a list subscriber, or an Applications Area > Director would recognize the conflict with RFCs 2047/2231 > and would object. You have mentioned conflict with RFCs 2047 and 2231. RFC 2047 does not make reference to language tags. The ABNF of RFC 2231 does not impose any limit on the length of language tags. RFC does contain an implicit length issue in that it updates RFC 2047, allowing language tags within encoded words, but it does not explicitly identify any upper bound on the length of language tags. By reading both RFC 2047 and RFC 2231, one finds that they assume that a language tag must be at most 64 characters long: - the maximum length for the encoded-word production is 75 characters long (not stated in the ABNF of RFC 2047 but rather in the prose) - encoded-word production of RFC 2047 includes 6 literal characters - RFC 2231 adds one delimiting character "*" between the charset and language tag - the shortest charset names are 2 characters long (e.g. "IT") - the shortest encoding length is 1 character long - the minimum encoded-text length is 1 character long An encoded-word must contain at least 11 characters that are not part of the language tag and have a total length of no more than 75 characters. Therefore, an upper bound on language tags that can be used in an RFC 2047/2231 encoded-word production is 64 characters. In many cases, where the charset tag or encoding is longer, the upper bound on the length of languages tags will be less, but the RFC gives no estimate or indication of how much less. This is a constraint on an application of RFC 3066; it is not a constraint on RFC 3066 itself. It is possible that other applications of RFC 3066 may impose limits that may be longer or shorter than that imposed by RFC 2047/2231. I see no reason why limits must be added as a constraint in a revision of RFC 3066. It would be a good idea, however, to point out in section 2.1 of the draft that some applications of this specification may impose limits on the length of accepted language tags, and perhaps to cite RFC 2231 as an example. My suggestions, then, in response to Bruce Lilley's comments are: - that we add a note prominently in section 2.1 of the draft explaining that some applications may impose limits on the lengths of language tags, and cite RFC 2231 as an example - that we revise the ABNF for the 'grandfathered' production rule to grandfathered = 1*3ALPHA *("=" 1*8alphanum) - that we add a note in the discussion of extensions stating that, when a language tag instance is to be used in a specific, known protocol, it is advisable that the language tag not include extensions not supported by that protocol (text can be added pointing out the inadvisability of including unrecognized extensions in the case of protocols that impose upper limits on the length of strings that may contain a language tag) - that recommendation 4 in section 2.4.2 be changed to say that extensions should not be removed except in the case that the language tag instance is to be inserted into a specific protocol known not to support the extension - that the language subtag registration form include an additional field following #7 (recommended prefixes for variants) asking for a reasonable estimate and examplar of the maximum length anticipated for language tags using the requested varient - that a requirement on extension RFCs be added in section 3.4 stating that they must include some explicit discussion of concerns related to upper bounds on length of language tags using the given extension - that we do not attempt any other changes to the ABNF to impose an upper bound on the length of language tags - that we add a note in section 3.1 indicating that descriptions in registry entries for ISO 639, ISO 3166 or ISO 15924 identifiers are intended only to indicate the meaning of that identifier as defined in the source ISO standard at the time it was added to the registry, and that the descriptions are not replacements for content of the source standards themselves - that we do not need to change the proposed format of the registry to include descriptions in multiple languages Peter Constable Microsoft Corporation _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf