> Date: 2004-12-11 11:53 > From: "Peter Constable" <petercon@xxxxxxxxxxxxx> > To: ietf-languages@xxxxxxxxxxxxx, ietf@xxxxxxxx > Our disagreement amounts to a basic question of whether parsers should be written based on the ABNF alone, or based on the ABNF plus other constraints provided in the spec. Clearly, I think anyone writing a parser should consider other constraints as well. No, I agree that a parser should take normative text into account, but I feel that there should be a reasonable effort made to make the ABNF agree with that normative text -- otherwise there's little point in providing ABNF. > As mentioned, the limit is imposed by other tight constraints on 'grandfathered'; you have already identified that the longest registered tag under RFC 3066 is 11 octets in length, therefore a 'grandfathered' tag can be at most 11 octets in length. But the constraints probably aren't as tight as you believe; the draft specifically permits a future revision to allow a primary subtag longer than 8 octets, or not purely alphabetic, etc. > a de-facto upper limit of 11 (subject to change if new tags are registered before the proposed spec is accepted). We're agreed on that, for the present draft, but apparently Mark Davis disagrees. And I am concerned about the loophole left for future revisions. > > > We could impose some upper limits on these things... > > > That leaves the extension portions' length at up to > > 25 * (1 + 1 + 8 * 9) = 1850 octets, not taking any other parts > > of a tag into account! Â That's way too long (the RFC 2047 > > limit for an encoded-word is 75 octets, including charset tag, > > some text, and some syntactic glue in addition to the language > > tag). > > The problem already exists in RFC 3066. Even apart from private-use tags, tomorrow someone could request a registration for a tag that's 87 octets long, and there's nothing in RFC 3066 that would prohibit acceptance. One would hope that under RFC 3066 rules, that the reviewer, a list subscriber, or an Applications Area Director would recognize the conflict with RFCs 2047/2231 and would object. If indeed that were to happen literally tomorrow, I am quite sure that an objection would be made. The situation is quite different under the draft proposal, where registration of a complete tag is not required, and where there are no upper bounds on length of a tag. > > > So, I think Bruce has identified a valid issue here. I personally would > > > not have characterized it as greatly exacerbating, though, > > > > IMO, an increase from 11 octets worst-case, which is tolerable > > for constructing RFC 2047/2231 encoded-words, to >> 1850 > > octets, which exceeds by a large margin what can be handled > > in a Content-Language or Accept-Language message header > > field, constitutes "greatly exacerbated". > > Repeating my previous point, RFC 3066 doesn't stop a registered tag from being 10^100 octets in length. RFC 3066 provides a registration mechanism that can be trusted to prevent that; in particular, the Applications Area Directors are supposed to look out for issues affecting the core Internet applications protocols. > I suggest that wording be added to the draft giving a strong recommendatation to users that they not use tags the complete length of which exceeds 75 characters. 75 octets would be too large for a language-tag used in an encoded word (perhaps different limits could be specified for different uses, but one would have to be careful about implicit re-use between applications). An encoded-word has the form: =?<charset>*<language-tag>?<encoding>?<text>?= and is limited to a total of 75 octets. Eliminating the syntactic glue (7 octets, unbracketed above) leaves a total of at most 68 octets for text, charset, encoding, and language-tag. There are at present two encodings, specified with 1-octet tags. Assuming that longer encoding tags are not required, that leaves 67 octets for charset, language-tag, and text. The text must be at least four octets in order to accommodate B encoded text, leaving 63 octets at most for charset and language-tag (ideally, one would prefer to leave more room than that for text). It is guaranteed (in theory, if not in practice) that there will be a charset name of no more than 40 octets for each charset, but that is not necessarily the preferred name (there has been some discussion about possibly reducing that limit). That leaves about 23 octets for a language-tag as an upper bound for use in an encoded-word. Obviously that hasn't been a problem in practice to date; the longest registered language tag is less than half that length. > > By deferring to the bilingual ISO lists for language and country > > tags, 3066 at least provided a minimal degree of internationalization. > > By explicitly limiting description fields to English and restricting > > the charset to US-ASCII, the draft proposal takes a giant leap > > backwards. > > The US-ASCII limitation existed in RFC 3066, so is not new. No, I'm talking about the character set of the description, which currently resides in the ISO lists, and is certainly not limited to ANSI X3.4 in those lists. Under the draft proposal, the description is limited to ANSI X3.4, which is a problem for the description for UN region 248, whose description includes an A-ring character, which is not in X3.4. I note that BCP 18 section 3.1 specifies that it MUST be possible to use the UTF-8 charset, so the specification of the registry as solely X3.4 appears to violate that provision of BCP 18. > On the more general point, I believe you are mistaking i18n concerns with localization concerns: No, I am concerned about changing what is currently internationalized (to an admittedly small extent) into something that is strictly monolingual in a severely restricted charset. > > > I don't quite understand what the critique is here: what is there to > > > internationalize about language tags? > > > > There should probably be a reference (at least informative) > > pointing to BCP 18 and mentioning that the language tags > > defined provide a means of labeling the language of text, > > Have you not read the abstract in the draft? [...] > Or the introduction? I have; neither mentions BCP 18 or the core Internet protocols. > > The draft (if/when approved) should also indicate that > > it updates BCP 18, which refers to RFC 1766. > > Is this right? This draft is not a replacement for RFC 2277, or an addendum to it. RFC 2277 also refers to RFC 1958, which was updated by RFC 3439, but surely RFC 3439 doesn't state that it updates BCP 18? (RFC 227 does have a section with significant overlap in topic, though, so perhaps this makes sense. I'm not well-enough versed in IETF document process to know.) N.B. "update" != "replacement". If the draft obsoletes 3066, which obsoletes 1766, then it affects 2277. 3066 should probably have so indicated also... > > Given the divergence noted above from RFC 3066's use > > of multilingual reference lists, the Internationalization > > considerations section should include a synopsis of the > > approach chosen (viz. to restrict description to English) and > > the rationale for that choice (see BCP 18 section 6). > > Again, this is a localization issue, not an internationalization issue. I do not consider this necessary or even appropriate. No, it's relevant to the extent that BCP 18 specifies that text strings are subject to internationalization, and the description field in the draft-proposed registry protocol certainly appears to be a text string (although the draft does not clearly state whether it is a text string or a protocol element). > > > > Â Â implications (ISO 8601 date format parsing). > > > > > > As mentioned above, this really is a non-issue. > > > > It's an issue (esp. in light of the finger pointing regarding > > accessibility to ISO 639/3166). > > As has been pointed out, there is no such finger-pointing in the draft. The finger-pointing accompanied the new last call and was used as justification for replacement of RFC 3066 with the proposed scheme. If indeed accessibility is a non-issue, then the justification for the proposed scheme, in whole or in part, rests solely on other considerations, such as they might be. > > Again, it is an issue that imposes requirements on language > > tag parsers. ÂWhat you've shown is that the ABNF is not > > consistent with what was desired to be expressed, and > > that makes it an issue that needs to be addressed. > > Again, I believe the bigger issue is not getting the ABNF to express what was desired No it is a concern because of a loophole left for future revisions to incorporate syntax which is not currently permitted by 3066, but which is inexplicably permitted by the draft's proposed ABNF. > > > The maximal length issue exists just as much > > > in RFC 3066 due to private-use tags; it is a technical concern that > > > might worth reviewing in RFC 3066bis, however; but it is not > > > insurmountable, and not a new problem. > > > > Private-use carries its own considerable baggage; aside from > > that, the draft proposal increases the length of non-private > > tags that affect both protocol design and implementations > > from a worst case maximum of 11 octets under RFC 3066... > > Worst case at present; a month from now it could be unlimitedly larger. No, the current registration review process in conjunction with the requirement would prevent that. The draft proposal decouples use from registration, which directly leads to unlimited length, and changes the review process (in unclear ways which I have not yet had time to review fully). > But I've accepted that it would be an improvement to add constraints on overall length. That's a start. _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf