RE: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

Dave Singer <singer@xxxxxxxxx> · Tue, 4 Jan 2005 09:58:52 -0800

At 9:14 AM -0800 1/4/05, ned.freed@xxxxxxxxxxx wrote:
This whole question of what 'matches' is subtle.  Consider the case
when I have a document that has variant content by language (e.g.
different sound tracks), and the user indicates a set of preferred
languages.  If the content has "de-CH" and "fr-CH" (swiss german and
french), and a default "en" (english) and the user says he speaks
"de-DE" and "fr-FR", on the face of it nothing matches, and I fall
back to the catch-all default, which is almost certainly not the best
result.

David, this isn't the half of it. The case you describe is actually one of the

easy ones, in that it can be handled by doing a "preferred" match on 
the entire

tag, with a "generic" match on the primary tag only having lesser precedence

but higher precedence than a fallback to a default.

Yes, I picked off an easy example for which the 'matching' section of 
the draft didn't seem adequate.  This really is a tar-pit, of course. 
Serbo-croatian used to be a language;  now it's serbian and croatian. 
I assume that they are mutually intelligible.  Serbian is probably a 
better substitute for croatian than some general default (or 
silence), though saying this in some parts of the world might start 
wars.

The whole question of what is a language, a variant or dialect of a 
language, or a suitable substitute for a language, would benefit some 
thought in any tagging scheme, though I agree the problem is not 
generally soluble.

I know of two other wrinkles in the RFC 1766 world:

(1) Matching may want to take into account the distinguished nature
   of country subtags in some way.

(2) SGN- requires special handling, in that SGN-FR and SGN-EN are in fact
   sufficiently different languages that a primary tag match should not be
   taken to be a generic match. (Of course this only matters if sign
   languages are relevant to your situation - in many cases they aren't.
   In retrospect I think it was a mistake to register sign languages this
   way.)

This proposed revision, however, opens pandora's box in regards to matching.
Consider:

(a) Extension tags appear as the first subtags, and as such have to
   be taken into account when looking for country subtags.

(b) Script tags change the complexion of the matching problem significantly,
   in that they can interact with external factors like charset information
   in odd ways.

(c) UN country numbers have been added (IMO for no good reason), requiring
   handling similar to country codes.

The bottom line is that while I know how to write reasonable code to do RFC

1766 matching (and have in fact done so for widely deployed software), I

haven't a clue how to handle this new draft competently in regards 
to matching.

And the immediate consequence of this is that I, and I suspect many other,

implementors are going to adopt a "wait and see" attitude in regards to

implementing any of this.

				Ned

--
David Singer
Apple Computer/QuickTime

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf