>3066) that go beyond the patterns 'll(-CC)" and "lll(-CC)". If we stick with RFC 3066, we will have no way of writing forward-compatible processors that will be able to do very useful matching. I want to reinforce what Peter has said. In RFC 3066 we have already registered language tags like zh-Hans, and zh-Hant. Nobody can parse out the script in the language tag because RFC 3066 does not provide for identification of the pieces. During the development of 3066bis, we have been holding off on registering all of the country variants of these, because we didn't want them to be redundant with the generated codes in 3066bis. If we don't get 3066bis, then we will end up needing to register the combinations zh-Hans-CN, zh-Hant-CN, zh-Hans-HK, zh-Hant-HK, zh-Hans-MO, zh-Hant-MO, zh-Hans-SG, zh-Hant-SG, zh-Hans-TW, zh-Hant-TW. And zh is just one example. There are many languages that can be written in different scripts, where it is important as a matter of practice to be able to distinguish the script as well as the country. There are very good reasons to have the script code before the country code, because differences by script swamp differences by country. Suppose that you are composing a web page by pulling together different pieces of data, and your target is Chinese simplified for Hong Kong. For one of those data sources, there is not an exact match. Given a choice between a data source in Chinese simplified, or a data source in Chinese Hong Kong (but traditional), you really want to pick the Chinese simplified. That is reflected in the use of the script value second (zh-Hant-HK), so that the common process of truncation will get the right result. This is similar to the reason why the language code comes before the country code. If we had the order CH-fr, then we could end up mixing French and German in the same page, because we would fall back (for one of the data sources) from CH-fr to CH, which could be German. âMark ----- Original Message ----- From: "Peter Constable" <petercon@xxxxxxxxxxxxx> To: <ietf-languages@xxxxxxxxxxxxx>; <ietf@xxxxxxxx> Sent: Thursday, January 06, 2005 07:42 Subject: RE: draft-phillips-langtags-08, process, sp ecifications,"stability", and extensions > From: ietf-languages-bounces@xxxxxxxxxxxxx [mailto:ietf-languages- > bounces@xxxxxxxxxxxxx] On Behalf Of ned.freed@xxxxxxxxxxx > Again, your pejorative dismissal of other people's concerns does not > mean your position is valid... > Parsing almost never is. But simply parsing these tag is not, and never > has > been, the issue. I think you guys are in violent agreement over country codes within a tag, and that the debate over intrepreting the wording of RFC 3066 serves no purpose. I think the intent of Mark's dismissal has been to refute perceived-invalid objections, in which case we need to consider that the line between perceived-invalid and truly-invalid has been blurred simply by the volume of discussion (the noise factor). There have been some invalid objections that bear some similarity to comments Ned has made as he has tried to make his point. (E.g. Bruce Lilly has claimed invalid back-compat problems on the incorrect premises that RFC 3066 does not permit ISO 3166 country codes except as second subtags or does not permit second subtags that are not country codes (at the moment I forget if it was one or the other or both).) But Ned's concerns are legitimate, I think. I'd say they are not necessarily blocking issues for this draft, because I think a possible outcome of discussion is to characterize them as concerns about outstanding issues that need to be solved rather than as concerns over the draft itself; but I do think they are valid concerns that deserve attention. In a nutshell, Ned was elaborating on a comment from Dave Singer that, once we have parsed a pair of tags and identified all the pieces, it's not a trivial matter to decide in every case how the two tags compare, and that there are factors that would exist if the draft were approved that didn't exist under RFC 3066. Again, I think this is a question that deserves discussion. In relation to the proposed draft, I don't see it as a particular problem with the draft. It is a problem that doesn't exist in RFC 3066, but that is only because RFC 3066 left us with bigger problems: it doesn't give us any way to identify pieces that we would be encountering in registered tags (apart from hard-coded tables compiled from versions of the registry that pre-exist a given implementation). RFC 3066 permits tags that have all kinds of internal structures. That is a problem as it will never allow us to derive much useful information from a tag with any confidence -- only the ISO 639 language category and in some cases a country category. I predict that in the future we will be seeing a significant number of tags (whether sanctioned without registration by a successor to RFC 3066 or as tags registered under RFC 3066) that go beyond the patterns 'll(-CC)" and "lll(-CC)". If we stick with RFC 3066, we will have no way of writing forward-compatible processors that will be able to do very useful matching. What this draft does is impose some order to all the other patterns within tags that are permitted, and tell us what the different pieces must be. As a result, we have more named pieces to deal with, and we are presented with the question that Ned raised: "Now we have more named pieces than we did before; what do we do with them?" That is a problem that will need to be addressed. But I don't think it's a reason to oppose the draft, since opposing the draft (or at least opposing any revision that introduces a richer internal structure) leaves us in a situation that must be characterized either as a worse problem or as turning our backs on increased functionality to meet real user needs. Peter Constable _______________________________________________ Ietf-languages mailing list Ietf-languages@xxxxxxxxxxxxx http://www.alvestrand.no/mailman/listinfo/ietf-languages _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf