Re: Last Call: 'Tags for Identifying Languages' to BCP

Bruce Lilly <blilly@xxxxxxxxx> · Tue, 30 Aug 2005 21:59:59 -0400

>  Date: 2005-08-28 16:25
>  From: Frank Ellermann <nobody@xxxxxxxxxxxxxxxxx>

> That's a last call, if you have better ideas than those in the
> draft speak up.  Your Content-Script idea is good, but won't
> help e.g. in encoded words (2047+2231).

Encoded-words have several characteristics, one of which is limited
length (in octets).  That has two implications w.r.t. script:
1. specifying script explicitly is unnecessary; it can be determined
   from the charset (always specified in an encoded-word) and the
   specific octets of the encoded text (ISO-8859-1 is latin script,
   KOI8 is Cyrillic, etc.).
2. an encoded-word has limited space available.  of a maximum of 76
   octets in an encoded-word specifying language, there are 8 for
   overhead, at least one (currently exactly one) for specification
   of encoding method, a charset specification (registered charsets
   have names up to 45 octets in length), the language tag, and some
   encoded text.  The encoded text must be at least one octet for Q
   encoding and a simple (unshifted) charset; for B encoding (and an
   unshifted charset) it has to be a multiple of 4 octets, and a typical
   charset with shift sequences will require on the order of 6 octets
   minimum (for Q encoding; 8-12 minimum for B encoding). Specifying
   (unnecessarily; see above) script reduces the available space for
   actual (encoded) text; possibly to the point of impossibility in
   pathological cases.

Specification of script is only a performance enhancement for long texts
(not the case for encoded-words) where a multi-script charset is in use.

While the Content-Script (or similar feature/filter mechanism) would not
be applicable to encoded-words, specification of script is unnecessary
for encoded-words (and undesirable due to impact on the available text
space).

Specification of script is only possible where a given text uses a single
script, and that limitation applies to any of the methods of indication
mentioned above, including the addition to language tags proposed by the
draft under discussion.

Script is a characteristic of written text; it is not applicable to (e.g.)
audio media types.  It really should be a text media type parameter (or
feature).

> This is a ready-for-Bruce's-review draft as far as I can judge
> this, but for obvious reasons only you can really judge it. ;-)

As I mentioned in an earlier message, without a concrete specification
for negotiation, it is not possible to fully assess the proposed syntax
changes.

> > Addressing the language range issue is not a WG work item
> > and, unfortunately, the algorithm issue is scheduled to be a
> > later work item than the registry issue.
> 
> Only my personal view of course, but the matching draft offers
> a syntactical form for ranges,

There is no such draft in Last Call at this time, as far as I know.

> if ISO 3166-1 pulls another CS 3066bis will handle it
> better than 3066 (no potential worldwide retagging confusion).

I am unaware of any "worldwide retagging confusion" w.r.t. language
tags and "CS".

> > it appears that management of WG participant conduct has been
> > rather lax
> 
> IBTD, the WG Chairs and the responsible AD did a very good job.

As an affected party, I disagree.

> > Revision to move the syntax specification to a separate
> > document, as mentioned above, would permit evaluation of the
> > registration procedures per se
> 
> You can also read chapter 3 per se, the mentioned 14 pages
> plus 3.1 as introduction (5 pages, format of the registry).

But a single section isn't being Last Called; it is the entire document,
and lacking specification of negotiation mechanisms it is not possible
to fully assess the document as it stands.

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf