RE: Last Call on Language Tags (RE: draft-phillips-langtags-08)

"JFC (Jefsey) Morfin" <jefsey@xxxxxxxxxx> · Tue, 04 Jan 2005 17:06:43 +0100

Dear Peter,

I am sorry to comment this again. But this is a Last Call over a private 
proposition. There is no other forum to comment this key document for the 
future of the Internet. There is also no other forum to correct what you 
say on me.

I whish to recall that the main issues are the pretence of the draft to 
obsolete RFC 3066 while being sometimes conflicting and to extend its scope 
without limit (cf. Addison Phillips comment) what would be an IESG 
commitment on the whole multilingual internet architecture.

I wish also to underline that I agreed with you on many points during the 
private list discussion and private mails.

At 03:58 04/01/2005, Peter Constable wrote:
For the past several years the majority of my work has been related to
standards pertaining to IT globalization in one way or another, and I
have encountered a few nexus of people interested in metadata elements
for describing linguistic properties of content; a number of the people
I have encountered in these contexts have congregated (metaphorically)
on the IETF-languages list, and a number of those have provided input on
this draft.

Hopefully a few people have congregated to support the proposed draft. Now 
their positions are to match a consensus process. If your propositions are 
not harming anyone and being usefull to some there is no reason not to have 
a consensus. Today the consensus inclines to say that they might be 
harmfull and should therefore be reserved to those who need them. Concern 
is that this might make them irreversibly incompatible. And that the 
benefits (for them and others) are not clear. The target is to try to clarify.

In each of these contexts, I have encountered general agreement with the 
idea that it is appropriate to include writing-system distinction as part 
of language tags; after some time, it has only been in the past couple of 
weeks that I have encountered people who have

questioned the decision to incorporate script IDs, and all of these have 
been people who have not been subscribed to the IETF-languages list, or at 
least have not been active contributors to discussion on that list.

I suppose I am among that "seldom" new comers. So let me comment on that:

1. I incorporated my international users need support organization in 1978 :-)

2. I never objected the scripting-ID. I objected that it was not given the 
same importance as language and country codes. I plead (and act) for 25 
years for the support of authoritative distinctions among users contexts. 
But I am not paid by a big employer.

3. I objected the scarcity of possible tags

4. I objected the exclusiveness in a registration approach versus a 
desription approach.

5. I supported the proposed scheme as long as its scope of application was 
defined and not a take-over on the multiligual Internet.

Last but not least, I received enough off list support to accept to spend 
time on this. There is NO consensus in the community and huge technical, 
societal, economical and political concerns. Because one does not 
understand what the Draft wants to achieve, for who and how. The main 
request is to clarify. There are no real objections (except to the paucity 
of the proposition) but concerns.

> It would be very helpful, to me at least, if you or he could
> identify the specific context in which such tags would be used
> and are required.  The examples should ideally be of
> IETF-standard software, not proprietary products.

You respond none. Just an application level problem.

I've used Chinese as one example, but there are many other cases, some
familiar to many and some less well known. Also, in relation to IETF
protocols, I mentioned only HTTP, but the same problems likely exist for
other protocol involving textual linguistic content where RFC 3066 is
used. For example, in searching for items in an LDAP directory, in may
be appropriate for an AttributeDescription to specify Tradition Chinese
rather than Simplified Chinese, or Serbian using the Latin-script
orthography vs. Serbian using the Cyrillic-based orthography.

Full agreement. But this is to be done through an open and inclusive 
semantic, not on an exclusive first come first serve registration basis. 
Next setp will be patents on languages descriptions.

In ideal terms, I do not think that all of the complexity of the
proposed draft is needed.

So let simplify it, and let deep into the areas were complexity comes from 
limited possibilities.

On the other hand, I think that some people's
characterization of the excessive complexity has been overstated, some
of the complexity I consider superfluous but not particularly harmful
(notably the extensions), and some of the complexity I think is an
unfortunate result of existing implementations and past practice (in
particular, the steps taken to avoid instability of ISO 3166 and the use
of both UN numeric IDs and ISO 3166 due to the combination of prior
usage of ISO 3166-1 together with the need for region identifiers other
than those provided by ISO 3166-1).

Complexity in real life issues comes often from:
- patching previous mistakes
- patching the reigidity introduced by previous simplifications.

Part of my reluctance to have script IDs included in RFC 3066 was due to
the fact that a set of tags had just been registered (some of which I
now wish didn't exist) which used various subtags in combination, and I
sensed that there was a lack of collective understanding of what the
internal structure of tags and relationships between subtags should be
(which is a direct cause that led me to write the paper I referred to
earlier).

This documents that "collective community thinking" is not always correct. 
This is why I am reluctant to any registration process on the (propritary 
?) blend of existing open tags list.

I have been party to the review process for the past five or so years,
and can say that the review process did not, IMO, always succeed in
avoiding regretable tags (I do not consider those that include script
IDs to be among them) because there was a lack of a model of what
ontology was needing to be described and what the appropriate elements
within a tag standing in what kind of relationship to one another were
needed. This draft doesn't describe such a model, but it does impose
one, which I think is moving in a good directiton.

!!!!!!!

do you actually say that the value of this draft is to impose the dearly 
missing model in ... not describing it ?????

Or is it misreading of mine?

Actually, no; I was trying to guess at existing applications that might
have particular problems with complexity, as you mentioned. Certainly
language-range matching is no more complex in the proposed draft than it
is today. I personally suspect that the language-range matching
algorithm is too simplistic, but I haven't gone beyond that myself to
start suggesting it needs to be replaced with something more complex.

Why do you want there would be an exclusive _unique_ matching algorithm? It 
is up to the application to decide of the algorithm when receiving a tag 
and possibly negotiate it in a web service or in actionning an OPES.

For my part, I made a point of informing TC 37 members of the
re-assignment of CS, and that led to a resolution at our Paris meeting
last August expressing strong concern over this. I did not ever hear any
response from either TC 46 or the ISO 3166 MA on this matter, however. I
don't know that I would have devised the approach to the handling of
this issue used in this draft had I been its author. I am deeply
concerned that stability be ensured in language tags, however, and if
this is the only way to ensure it I can accept it.

We had a long talk at the end of the August Paris meeting at AUF over ISO 
639-2 and the need to aggregate language ID, scripting ID, usage 
description, authoritative sources and also country codes and on the 
complexity to take into account "sub-code" and private codes and to add 
accidental or new descriptors in order to document venacular ways of 
speaking, thinking, talking. Obviously it was a private discussion with a 
few people sharing the same ideas ... May be you were there (we were the 
last to leave the room and the building).

All the best.
jfc

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf