> RE: New Last Call: 'Tags for Identifying Languages' to BCP > Date: 2004-12-10 16:37 > From: "Peter Constable" <petercon@xxxxxxxxxxxxx> > To: ietf-languages@xxxxxxxxxxxxx > > Bruce Lilly's message makes several inaccurate statements against the > proposed draft, and misrepresents some of the changes being made. My > main concern is that I don't know where it was circulated. I might be > wrong, but I get the impression it was written with a different audience > in mind and then copied here. > > > > > -----Original Message----- > > > > There are problems with the the RFC 3066 definition of generative > tags, > > > however. The ISO 639 and ISO 3166 standards are not freely available > and evolve > > > over time. > > > > Accessibility has not been a problem for this implementor... > > I agree with Bruce, that accessibility of ISO 639 and ISO 3166 has not > been the issue. Unfortunately, his comments do not indicate what the > real issues were. My comments are in response to the "New Last Call" made on the ietf-announce list. They are in response to the text which accompanied that new last call and the text of draft-phillips-langtags-08.txt dated November 2002. The specific claim that accessibility has been a problem was made in the text accompanying the new last call (q.v.). For those not subscribed to the ietf-announce list, the text of the new last call can be seen at http://www1.ietf.org/mail-archive/web/ietf-announce/current/msg00755.html > > > The largest change in the specification is that it modifies the > structure of > > > the language tag registry. Instead of having to obtain lists of > codes from five > > > separate external standards... > > > Contrary to the implicit claim, the ISO documents mentioned > > above comprise two standards (available in two languages each), > > not "five separate external standards". > > RFC 3066 made reference to ISO 639-1, ISO 639-2 and ISO 3166-1; the > proposed replacement adds ISO 15924. I would count that as four ISO > standards. Up-to-date code tables for all four are readily available. For the purpose of implementation of validation of language-tags, the ISO 639 list includes both the 2- and 3-character codes in a single document. The claim (again from text accompanying the new last call) states that there is some difference in the draft proposal from 3066 in that 3066 (the text alleges) requires "lists of codes from five separate external standards" -- in fact two lists suffice for 3066 implementations. Â > I think this is a serious misrepresentation of the intent of the > proposal: the draft nowhere suggests, let alone declares, that the > source ISO standards are irrelevant. A poor choice of words on my part. The text and draft suggests that only the proposed new registry should be consulted, and the draft clearly specifies that the description of all subtags is to be provide in English (only). > Rather, the intent of the > comprehensive registry is to ensure stability in IETF implementations by > protecting them from unpredictable changes in ISO standards, such as the > re-definition of "CS" as a country identifier not long ago.The > denotation of identifiers listed in the registry is based on their > definition in the ISO standards, not on an informative descriptor > provided in the registry; It's not clear to me that the proposal will provide protection against the whims of politicians. If the definition of "CS" as a country code changes again under the proposed scheme, how is one to determine specifically what some archived language-tag referred to at some point in time? I'm not particularly concerned about that problem, as I am resigned to instability associated with anything specified by politicians (and that includes the UN region codes). > and as Bruce quite clearly pointed out, those > source standards are readily accessible. So the suggestion that > implementers will no longer have access to French-language names from > the source ISO standards simply is vacuous. But if the proposed new registry's description of "CS" says "foo" and the ISO standard code list says "bar", what's an implementor supposed to present to a user as *the* description associated with "CS"? > As for concerns of Anglo-centricity, I'm sure that the authors had no > anti-French motive, and would be open to suggestions as to how that > could be addressed. One possibility would be two description fields. But the registry would need a charset closer to ISO-8859-1 than to ANSI X3.4 as currently specified. Or an encoding scheme. > Surely, though, this is not a technical argument > against the proposal. Not purely technical, though it presents problems for existing implementors who provide bilingual support. Eliminating bilingual descriptions for the language, country (and UN region) codes leaves implementors in a quandary. > > The ABNF in the draft permits all of the following tags which > > are not legal per the RFC 3066 ABNF: > > Â Âsupercalifragilisticexpialidoceus > > Â Ây----- > > Â Âx1234567890abc > > Â Âa123-xyz > > In fact, none of these is permitted by the ABNF of the draft. ABNF from the draft: Language-Tag = (lang *("-" extlang) ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse]) / privateuse ; private-use tag / grandfathered ; grandfathered registrations lang = 2*3ALPHA ; shortest ISO 639 code / registered-lang extlang = 3ALPHA ; reserved for future use script = 4ALPHA ; ISO 15924 code region = 2ALPHA ; ISO 3166 code / 3DIGIT ; UN country number variant = ALPHA (4*7alphanum) ; registered variants / DIGIT (3*7alphanum) extension = singleton 1*("-" (2*8alphanum)) ; extension subtag(s) privateuse = "x" 1*("-" (1*8alphanum)) ; private use subtag(s) singleton = ALPHA ; single letters ; (except x, which has special meaning) registered-lang = 4*8ALPHA ; registered language subtag grandfathered = ALPHA *(alphanum / "-") ; grandfathered registration alphanum = (ALPHA / DIGIT) ; letters and numbers Note that the RFC 2234 definition of an asterisk in front of a production (with no adjacent numbers, as is the case in the "grandfathered" production) means zero or more repetitions (without upper bound) of the production to the right of the asterisk. That means that the "grandfathered" production (which is an alternative in the Language-Tag production) will match any of the following text tags (comments to the right separated by a semicolon): x ; ALPHA followed by zero repetitions xa ; ALPHA followed by one ALPHA (see alphanum) x- ; ALPHA followed by one HYPHEN Â Âsupercalifragilisticexpialidoceus ; ALPHA followed by many ALPHAs (see alphanum) (example previously given) x1234567890abc ; ALPHA followed by 13 alphanums (as previously given) a123-xyz ; ALPHA followed by three DIGITs (see alphanum) followed by one HYPHEN followed by three ALPHAs (example previously given) y----- ; ALPHA followed by five HYPHENs (example previously given) I say the ABNF from draft -08 (quoted above) allows those; you say no. Either you're looking at different ABNF or one or more of us doesn't understand ABNF. If you wish to convince me that I don't understand it, you'll have to do better than simply claiming that I'm wrong with no supporting reasoning. > > Specifically, the draft allows, and RFC 3066 disallows: > > Â Âsubtags more than 8 octets in length > > This is incorrect. It was true of an earlier draft, but that was > changed. The "new last call" was for version -08; I downloaded it from the URI in the new last call and copied the ABNF above from that. My analysis is above. I await your rebuttal or retraction. > > Â Âhyphens which do not separate subtags > > Â Âzero-length subtags > > These near-equivalent statements are incorrect. No hyphen may be > permitted without a non-initial sub-tag, and no sub-tag can be an empty > string. See the "y-----" example above, based on the published ABNF. Again, I await your rebuttal or retraction. > > Â Âprimary tags which are not purely alphabetic > > This is incorrect. A primary sub-tag must be 2*3ALPHA or 4*8ALPHA, or > "i" or "x". See the "a123-xyz" example above (in RFC 3066 parlance, the "a123" part is the primary tag, which clearly contains DIGITs. One more time, I await your rebuttal or retraction. _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf