Re: Last Call: 'Tags for Identifying Languages' to BCP

"JFC (Jefsey) Morfin" <jefsey@xxxxxxxxxx> · Mon, 29 Aug 2005 02:33:30 +0200

Dear Bruce,
I will try to quickly comment/respond/suggest on some of your well made points.

At 16:15 28/08/2005, Bruce Lilly wrote:
>  Date: 2005-08-25 20:55
>  From: "JFC (Jefsey) Morfin" <jefsey@xxxxxxxxxx>
> the privacy problem is the "what you read, who you are" intelligence
> leak.

That is to some extent true of any negotiation mechanism and negotiated
value.

True. The problem are:
- the unecessary accumulation of orthogonal information
- the easily identified characteristic format: an enormous difference 
between "xx-xx-xxx-xx" (Draft) and "xxxx" (ISO 639-6)
- the lack of alternative (are we sure there are no other 
architectural way to address the same need without information leak)
- the lack of encryption
- the "spam" aspect: I am imposed to receive the langtag.

> Today langtags are not yet much used (say the W3C people in the
> WG-ltru) when compared with  what they should in XML, HTML, etc.

XML, HTML, etc. are not IETF protocols and should not be the main
consideration in IETF work on IETF documents,

They are specifically quoted by the Charter. Also is CLDR a private 
proposition to unify "locale" file which has interest but also competition.

especially as language tags
are heavily used by IETF protocols, notably MIME (RFCs 2045, 2047, 2231,
3282) and widely-deployed core IETF application protocols which use MIME
(e.g. the Internet Message Format and its applications (email, news, voice
messaging, EDI, etc.) and HTTP and applications using HTTP as a substrate.

RFC 2231 is among the reference quoted. I more interested in R&D. My 
concern is that OPES have been disregarded.

> This
> is all what this proposition is about. This proposition is to give
> _one_shot_ in a _standardised_ way the language, the script and the
> country.

This was discussed during Last Call of the previous non-IETF (individual
submission) attempt.  IIRC David Singer brought up several examples of
other pieces of information (e.g. legal/copyright variations) that could
also be negotiated and which might affect the presentation of content (or
choice among alternative content).  Lumping all of these separate items into
one tag is a poor design as it impedes negotiation and tends toward lengthy
tags which are incompatible with fixed-length mechanisms such as MIME
encoded-words.  While there is some mention of this issue in the document
under discussion, its treatment and resolving the underlying issue in a
manner that would minimize the problems are lacking.

The work we carried on language in a common reference center (where 
are stored the common parameter of a relation) shown us that must be 
included in negociation two classes of additional information. The 
parameters in the community (we call referent: i.e. dictionary, etc) 
and the context of the exchange (style, personal meanings, 
circumstances, etc.). These elements are necessary for OPES call-out 
servers supervising a relation. These elements are by default used by 
... Word (language, script, country, dictionary, style).

The Draft proposes a system which permits to evaluate the locale the 
computer should support for end to end interoperability purposes. It 
does not necessarily permit to establish, maintain and serve a brain 
to brain interintellibility.

Let's separate three issues:
1. privacy
2. tagging
3. negotiation

The privacy issue exists whenever any information is conveyed; the user
needs to balance privacy concerns with facilitation of communication.
Mechanisms such as TLS can be used to limit the visibility of the information
to the end points of communication; ultimately it boils down to a matter of
trust in the end-point partner in the communication exchange.  I believe
that the issue is dealt with adequately in the security considerations
section of the document under discussion (some mention of transport-level
protection of privacy would be welcome).

Not really: see above. The concept is an help to privacy violation:
- more secure alternatives should be investigated and proposed
- the danger is not worth the result, necessary information is missing.

Tagging identifies characteristics of a particular piece of content.  For
that purpose alone, it makes little difference (other than regarding the
aforementioned compatibility issues with existing IETF mechanisms) whether
the characteristics are lumped or separate.  There are existing IETF
mechanisms which permit handling of either lumped or individual 
characteristics
(e.g. the extensible header field mechanism of RFC 2045 and the 
"feature/filter"
mechanism of RFC 2533/2738/2912).  Tagging per se identifies characteristics
of content.  While that may be used to infer something about the content
provider, such inferences may be unreliable, particularly for providers that
support a wide variety of characteristics for the content in question.

This confusion will be an increasing problem. More and more the 
"architext" we use (the data from which we infer the text we read) 
become intelligent and multilingual. I currently use a site 
multilingual generator. This means that it uses multilingual texts to 
generate unilingual version of a web site. It uses a default langtag 
scheme (:xxx) to indicate the language of the lingual parts.

Negotiation of characteristics is where several issues arise.  One such
issue, as discussed here in December 2004/January 2005 relates to an
algorithm for matching content characteristics (e.g. between a particular
piece of content and a specified range of acceptance (as in an RFC 3282
Accept-Language field).  RFC 3066 skirted that issue as it stopped short of
specification of an algorithm, and as it specified a mere two particular
characteristics (language per se, and country) which could be combined in
a tag.  That was not true of the individual submission, which combined at
least 5 characteristics and specified an algorithm.  As a result of issues
with that approach, the LTRU WG was established with a charter to produce a
BCP (for registration procedures) and a separate Standards Track document
for topics such as algorithms which are unsuitable for BCP.  A related issue
is the interaction of the established negotiation mechanism (viz. the RFC
3282 Accept-Language field) and potential use of the other (feature/filter)
mechanism for negotiation.  The Accept-Language field provides for
specification of language ranges and for associating a preference value
with specific languages (as defined in RFC 3066) or ranges.  The proposed
mechanism in the individual submission of late last year (essentially
unchanged in the LTRU product (see discussion below)) does not address the
language range issue, and that issue is greatly complicated by conflating
separate characteristics into a single tag.  Addressing the language range
issue is not a WG work item and, unfortunately, the algorithm issue is
scheduled to be a later work item than the registry issue.

The language negociation issue is independent from any language 
identfier format. But obviously langtag formats may or not better 
serve language negociation.

Added to that is the fact that the specification of the tag format 
is mixed with
registration procedures.  Negotiation of separate characteristics is much
simpler than that of a combined conflation of characteristics; each
characteristic can be assigned separate preference values, and irrelevant
characteristics (e.g. script w.r.t. spoken language) can be easily ignored.

At this stage many negociation elements are missing. The elements 
related to the referent and to the context are missing. For example a 
traveler will accept more easily a foreign language when it comes to 
the location he tours (context). And a professional when it comes to 
a technical discussion (referent). All the more than terminology OPES 
services or on the fly traduction assistance can be provided

As negotiation and related issues represent a critical technical issue for
the design of language tags (viz. keeping separate characteristics out of
*language* tags), it is essential that such negotiation issues be considered
carefully before specifying the format of tags.  Unfortunately, that has not
been done, and considering the published WG milestones it appears that that
issue has not been taken into consideration.  It should be pointed out that
such issues have been raised, both in the discussion during Last Call of the
individual submission and as a result of subsequent work.  However, it
appears that the WG has not considered the issues, with the effect that the
WG product lacks the "particular care" expected of BCP documents (RFC 2026).

It is to note that ISO 639-4 work is about discussing guidelines in 
that area. This work is under way and was not considered.

Note that it is not the registration procedural issues that are typical of
BCP documents that are problematic; rather it is the conflation of separate
characteristics into a single tag syntax, specified in the same document,
which raises problems related to content negotiation.

Part of the problem is the scheduling of WG work items as noted above
(viz. negotiation issues are critical to design of tag syntax, and should not
have been deferred until after syntax specification).  Another large part of
the problem is WG management; in addition to the issues raised by John
Klensin the last time that LTRU participation was discussed on the IETF
discussion list -- and with which I wholeheartedly agree -- it appears that
management of WG participant conduct has been rather lax; proponents of the
individual submission effort who are participating in the WG tend to resort
to ad-hominem attacks when a problem is identified or when an alternative
approach is raised, with no visible intervention by the WG co-chairs.  That
has also (i.e. in addition to the factors which John identified) had the
effect of limiting WG participation by individuals.

I will not object that remark. The advantage was that proposing an 
alternative approach resulted in an improvement of the ABNF to 
impeach it. The result is a relatively clean default ABNF which now 
permits to avoid confusion with specific solutions introduced by 
reserved singleton. This permits to support:
- my Draft as a default proposition
- to specify easily other formats and conceptions (such as based upon 
ISO 639-6, or ISO 11179 conformant, etc.) without risking conflicts.

Specification of "language" tag syntax which conflates other content
characteristics prior to open and professional discussion of negotiation
issues and alternative approaches would be a premature lock-in of a design
choice.  As the document under discussion specifies a conflation of such
characteristics without open discussion -- indeed hampered by unchecked
unprofessional conduct -- it should not be approved as BCP in its current
form.  Separation of syntax specification to a separate document,

Yes!!!

to be specified after due consideration of negotiation issues, leaving purely
procedural issues of registration,

Yes!!! supporting multimodal competences (not only script, but also 
signs, voice, icons, moods, style, etc.)

 would be one approach to enable making
a decision on BCP registration procedures independently of an in advance of
a concrete specification of negotiation issues and tag syntax.  However,
as it stands, the document cannot be evaluated for soundness of the tag
syntax design in the absence of a specification that addresses negotiation
issues (in a backwards-compatible manner with the existing negotiation
mechanisms (viz. MIME Content- and Accept- fields and feature/filter
negotiation).

Therefore, at minimum, I recommend that the IESG defer a decision on the
subject document until such time as the full impact of the early design
choice to conflate multiple characteristics into a single tag can be fully
evaluated w.r.t. proposed matching algorithms and impact on existing
IETF-approved negotiation mechanisms.

At that time we should have running services. ISO 639-6 authors just 
announced that sample table will be available in Novembre. And ISO 
639-3 author expects it to be published by the end of the year. The 
we can start experimentation. Locking the multilingual internet core 
system into a final ABNF seems premature.

 Revision to move the syntax
specification to a separate document, as mentioned above, would permit
evaluation of the registration procedures per se independently of such
concerns, and would be one way to move forward on those registration
procedures quickly (i.e. independently of analysis of the syntax design)
if that is deemed desirable.

Aside form that, the IESG (via the cognizant ADs) should address the issues
of WG charter work items and milestones as they relate to consideration of
negotiation issues prior to locking down a tag syntax specification, should
emphasize the importance of backwards compatibility with established,
approved, and widely deployed IETF protocols and mechanisms,

and documented efforts such as OPES, document the way langtags will 
be used and their applications documented.
jfc

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf