Bruce Lilly scripsit: > > > Precisely; an RFC 1766/3066 parser, based on the 1766 and > > > 3066 specifications, can expect four classes of language tags: > > > 1. ISO 639 language code as the primary subtag, optionally > > > )B followed by an ISO 3166 country code as the second tag > > > 2. i as the primary tag; complete tag registered > > > 3. x as primary tag; private-use > > > 4. some other IANA-registered complete tag > > > > > > "sr-CS-Latn" fits category 1. "sr-Latn-CS' fits none. > > > > You are mistaken; "sr-Latn-CS" fits your category 4. > > I think not; it is not a registered tag. Technically correct; however, it is a potentially registerable tag, and as such an RFC 3066 parser that does not have access to the IANA registry will accept it (in the language of the new draft, it is well-formed but not valid). > There is a possibility > that it could fit through the "no rules apart from the syntactic > ones for the third and subsequent tags" given the registration of > "sr-Latn" (you are correct about that; I missed it). In that > respect, the choice of examples is poor; consider "en-US-Latn" > (category 1) vs. "en-Latn-US" (no category). In fact, neither of these is currently valid, but both are registerable. Category 1 tags cannot themselves contain third subtags, though they can match tags which contain third subtags; this is a fundamental error in your reading of RFC 3066 which infects the rest of your argument. > Right. I.e. they should be able to deal with superfluous stuff > on the right. But not script tags that suddenly appear between > language code and country code. A validating RFC 3066 parser should *not* accept "superfluous stuff on the right". You are confusing validation with range matching. > Again, poor choice of example. Consider "en-Latn-US" vs. "en-US-Latn". > If one wants (presumably text) in US English in Latin script, the > latter string is a valid RFC 3066 language tag which matches the > known semantics of "en-US", even if the RFC 3066 parser has no way > of interpreting the 3rd (and any subsequent) subtag(s). Recte: it is a well-formed but invalid tag which matches etc. > The former > is not a *valid* (neither registered in its entirety, nor beginning > with language code and country code) language-tag, nor could it be > matched by an RFC 3066 parser to anything greater than plain "en", Correct. > and that's presuming that such a parser would even attempt to match > a known invalid tag to the set of valid tags. If it processes en-US-Latn, then it is handling well-formed but invalid tags, and should process en-Latn-US as well (and match it against "en"). > No, the RFC 3454 considerations for what is valid are based on > protocol considerations, not on a Quixotic quest for "stability" > of nations. The draft does not attempt to stabilize countries, only the codes applied to them. ISO 3166, as has been amply demonstrated, does not and cannot do so, since it codes for the names of countries, not the countries themselves. -- They do not preach John Cowan that their God will rouse them jcowan@xxxxxxxxxxxxxxxxx A little before the nuts work loose. http://www.ccil.org/~cowan They do not teach http://www.reutershealth.com that His Pity allows them --Rudyard Kipling, to drop their job when they damn-well choose. "The Sons of Martha" _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf