This whole question of what 'matches' is subtle. Consider the case when I have a document that has variant content by language (e.g. different sound tracks), and the user indicates a set of preferred languages. If the content has "de-CH" and "fr-CH" (swiss german and french), and a default "en" (english) and the user says he speaks "de-DE" and "fr-FR", on the face of it nothing matches, and I fall back to the catch-all default, which is almost certainly not the best result.
David, this isn't the half of it. The case you describe is actually one of the
easy ones, in that it can be handled by doing a "preferred" match on the entire
tag, with a "generic" match on the primary tag only having lesser precedence
but higher precedence than a fallback to a default.
Yes, I picked off an easy example for which the 'matching' section of the draft didn't seem adequate. This really is a tar-pit, of course. Serbo-croatian used to be a language; now it's serbian and croatian. I assume that they are mutually intelligible. Serbian is probably a better substitute for croatian than some general default (or silence), though saying this in some parts of the world might start wars.
The whole question of what is a language, a variant or dialect of a language, or a suitable substitute for a language, would benefit some thought in any tagging scheme, though I agree the problem is not generally soluble.
I know of two other wrinkles in the RFC 1766 world:
(1) Matching may want to take into account the distinguished nature of country subtags in some way.
(2) SGN- requires special handling, in that SGN-FR and SGN-EN are in fact sufficiently different languages that a primary tag match should not be taken to be a generic match. (Of course this only matters if sign languages are relevant to your situation - in many cases they aren't. In retrospect I think it was a mistake to register sign languages this way.)
This proposed revision, however, opens pandora's box in regards to matching. Consider:
(a) Extension tags appear as the first subtags, and as such have to be taken into account when looking for country subtags.
(b) Script tags change the complexion of the matching problem significantly, in that they can interact with external factors like charset information in odd ways.
(c) UN country numbers have been added (IMO for no good reason), requiring handling similar to country codes.
The bottom line is that while I know how to write reasonable code to do RFC
1766 matching (and have in fact done so for widely deployed software), I
haven't a clue how to handle this new draft competently in regards to matching.
And the immediate consequence of this is that I, and I suspect many other,
implementors are going to adopt a "wait and see" attitude in regards to
implementing any of this.
Ned
-- David Singer Apple Computer/QuickTime
_______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf