> Date: 2005-01-01 21:58 > From: "Peter Constable" <petercon@xxxxxxxxxxxxx> > To: ietf-languages@xxxxxxxxxxxxx, ietf@xxxxxxxx > > > 2. ÂRFC 3066 did not require every possible combination of language > > > subtag + country subtag to be registered. > > > > None *could* be registered. Even if by some oversight or lapse of judgment the tag "en-US" were to be registered, its interpretation by a parser would be as an ISO 639 language code followed by an ISO 3166 country code. SUch a registration would therefore be pointless. In practice, therfore, it simply wouldn't happen. > > > Indeed, Section 2.2 of RFC > > > 3066 specifically says such combinations "do not need to be registered > > > with IANA before use." ÂYet you criticize RFC 3066bis for allowing > > > "en-Latn-US-boont" to be used without being registered as a unit. > > > > Yes, because an RFC 3066 parser cannot make any sense of it. > > I.e. the proposed draft lacks "backwards compatibility". > > It would be entirely possible for "en-Latn-US-boont" to be registered under the terms of RFC 3066. But it hasn't been. No RFC 3066 parser will therefore find that complete tag in its list of IANA registered tags, nor will it be able to interpret "Latn" as an ISO 3166 2-letter country code. > In what sense would any existing RFC 3066 parser (assumed that it conforms to RFC 3066) not be able to make any more or less sense of that than any other registered tag? You're missing the critical factor: it is NOT a registered tag -- an RFC 3066 parser has no way of recognizing it. > > > > [de-AT-1901, incidentally, (as an example) does not meet the RFC 3066 > > > > requirement of 3 to 8 characters in the second subtag for registration > > > > with IANA...]. > > There is nothing in RFC 3066 that says a registered tag must have 3 to 8 characters in the second subtag. It simply requires that any tag in which the second subtag is 3 to 8 letters must be registered. The following rules apply to the second subtag: - All 2-letter subtags are interpreted as ISO 3166 alpha-2 country codes from [ISO 3166], or subsequently assigned by the ISO 3166 maintenance agency or governing standardization bodies, denoting the area to which this language variant relates. - Tags with second subtags of 3 to 8 letters may be registered with IANA, according to the rules in chapter 5 of this document. - Tags with 1-letter second subtags may not be assigned except after revision of this standard. That does not permit tags with two-letter second subtags to be registered in the IANA registry; it permits that only for "Tags with second subtags of 3 to 8 letters". Granted, it could be clearer. > > > Absolutely correct. ÂThe needs for RFC 3066 tags that go beyond language > > > + country has gotten to the point where they have been registered in > > > violation of the RFC. ÂDoes that not indicate the need for a revision of > > > the core specification? > > > > No, it indicates that the review/registration procedure has violated > > the rules of syntax specified by BCP, and as a result has caused > > problems of a nature similar to those being criticized w.r.t. ISO > > MA action (pot to kettle: "you're black"). > > Um, this entire sub-thread was based on an invalid premise. No rules of syntax were violated in any review/registration procedure. See the direct quote from RFC 3066 above. > There is no reason to create a separate mechanism. When identifying textual content, Language is not exclusively associated with text. It is also a characteristic of spoken (sung, etc.) material (but script is not). > the identity of the writing system Writing doesn't apply to spoken material, etc. There is nothing in RFC 3282 or MIME that requires that Content-Language and/or Accept-Language fields be used exclusively with written text. > *is* very closely related to the identity of the language variety. > Indeed, the writing system is generally going to be of greater importance than distinctions such as dialect For spoken material!?! I don't think so. > It is not adequate to simply say that script can be identified from the charset or range of codes used. In the former regard, a charset of UTF-8 provides no information. Note my use of "or" not "and". I certainly did not state that the information could be obtained from charset alone in all cases. > In the latter regard, relying on the range of codes used in content does not provide a way to request an HTTP server to return pages that are (say) Azeri in Latin script rather than Cyrillic script. (You have mentioned numerous times the need to respect how language tags are used in Internet protocols; pot to kettle... ) The analogous way to handle that in Internet protocols would be via Content-Script and Accept-Script where relevant (which they would not be for audio media). > > > Perhaps someone will make the case that > > > Japanese written in Romaji needs to be specially indicated and will > > > write a request for "ja-Latn", and they will be right too. ÂAllowing > > > script subtags to be used generatively, instead of having to be > > > individually registered, solves this real problem. > > > > In an inappropriate way. Without consideration for backwards > > compatibility. ÂIn violation of the BCP that specified the syntax > > and registration procedure. > > Not inappropriate at all. Specifying script for audio material is as inappropriate as specifying charset. In Internet protocols, we do not burden protocols with having to interpret charset information for non-text material; we should not do so for script information. > And all your repeated comments about lack of consideration for backwards compatibility and violation of syntax and procedures of BCP47 have been shown to be invalid. Sorry -- saying so doesn't make it so. I have explained in detail that an RFC 1766/3066 parser cannot be expected to make sense of unregistered "sr-Latn-CS" etc. I have pointed to specific second subtag length requirements in RFC 3066 for registration. > > RFC 3066 doesn't require "haw-US", and if encountered provides for > > matching it (in an "accept" role) with "haw" (as content to be > > provided). "sr-Latn" and "sr-Latn-CS" cannot be matched by an > > RFC 3066-compliant process to anything, since they do not fit the > > RFC 3066 syntax for well-formed language tags. > > Certainly they do; and certainly an RFC 3066 parser will match "sr" with "sr-Latn" or "sr-Latn-CS", and "sr-Latn" with "sr-Latn-CS". No, a strict RFC 3066 parser will not be able to identify "sr-Latn" or "sr-Latn-CS" as valid tags. _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf