Re: Ietf-languages Digest, Vol 24, Issue 5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Gentlemen,
I see several points discussed here which are/are not of the same order and seem confusing the issue.


1. the discussion creeps from Harald's RFC 3066 to Multilingual Internet. It seems strange to discuss byte oriented details without having first a Multilingual framework telling what is the scope of the discussion and its implications (which are certainly major) on the whole Internet architecture. I submit that an IAB guidance is first necessary. Before going any further a true WG-Multilingualism should be created and open to everyone (a private IETF-Language lists should be an interim situation towards such a WG)

2. I see quoted "RFC 3066bis" as a document. The RFC Editor seems to ignore that RFC? Where can I find it?

3. there are at least four different levels:

- what is Multilingualism vs. vernacularism (there are 6000 human languages but a standard should be able to support non scripted and computer generated and past languages, what may lead to millions of references).

- vernacular granularity has nothing to do with geography and countries. The way this inserts into the general digital convergence (is the IANA the proper register?). The time relation to other standards which calls for a kind of "time hierarchy" : a cross standard consistent rule on the way to support permanence.

- the tag's semantic itself. This seems manageable from the vairous exchanges, but calls for a unique comprehensive and maintained document for parsers (programs and people having to sort language issues), what is different from the RFC culture.

- the multilingual (not uni/bilingual) tag description which is necessary for the different languages accompanying culture/databases to identify the same language.

IMHO the sub-tag granularity is to match the real life and research granularity. This does not mean that each tag must be registered but that each sovereign, authoritative or historical language/cultural oriented source must be able to register its own sub-tag and to mutually reduce possible conflicts among themselves (the trade-off is between a standard which would conflict with reality, and a reality conflict resilient flexible standard). The same as the IANA is not in the business of defining countries (Jon Postel, RFC 1591) it should not be in the business of defining languages.

I also submit that IANA is not the proper place anymore to support such a Register. Experience shown that IANA (now a function of ICANN) is subject to controversies in this or in parallel real life areas: ccTLD delegation, ccTLD entries in the root file, accepted MINC reaction to the Polish non concerted introduction of Arabic, Russian and Hebraic tables, ICANN strategy for internationalized rather than multilingual TLDs, etc. I also submit that UNESCO, MPEG or other standard/cultural organizations involved in the daily reality (universities, editors, posts, governments, copyrights, WIPO, etc. etc.) are more concerned and may make their own standard prevail after an unnecessary and harassing dispute. It seems that any semantic able to support open sub-tags whatever they originate from, is useful. Going any further would push in favor of a less and less [unilingual or internationalized] network centric market against a market evolution toward user centric [multilingual/multiulcural] networked relations [P2P, VoIP, NAT, coreboxes, OPES, etc.].

One of the possibilities of the current IETF administrative reform could be to give the IETF a structure that could be funded to permit it to participate as such into harmonizations negotiations. Due to the importance of the matter it will most probably be addressed at the WSIS level, since it is one of its priority. I therefore submit that we forget byte oriented details for the time being, keep this proposition as a draft and use it to open a dialog with the WGIG over who should to what with who. We can certainly wait one year more to have a globally accepted approach which will save every one a huge amount of time and money.

This dialog is not easy as we have no direct IETF/IAB/IESG representative there (except Avri who is multilingual, but representing the Civil Society). But there are enough WGIG members interested in Multilingualism and technically competent to understand, comment and progress on this file. These posts and the following URL should given them a comprehensive understanding of what is at stake.

Draft author's comments: why the draft.
http://www1.ietf.org/mail-archive/web/ietf-announce/current/msg00755.html

Text of the Draft:
http://www.ietf.org/internet-drafts/draft-phillips-langtags-08.txt

Text of the RFC 3066
http://ietf.org/rfc/rfc3066.txt?number=3066

If there are other links to present, I am interested in collecting them and in publishing them on various Multilingualism oriented sites.
Thank you.
Jefsey Morfin




On 01:54 11/12/2004, Bruce Lilly said:

> RE: New Last Call: 'Tags for Identifying Languages' to BCP
>  Date: 2004-12-10 16:37
>  From: "Peter Constable" <petercon@xxxxxxxxxxxxx>
>  To: ietf-languages@xxxxxxxxxxxxx
>
> Bruce Lilly's message makes several inaccurate statements against the
> proposed draft, and misrepresents some of the changes being made. My
> main concern is that I don't know where it was circulated. I might be
> wrong, but I get the impression it was written with a different audience
> in mind and then copied here.
>
>
>
> > -----Original Message-----
>
> > > There are problems with the the RFC 3066 definition of generative
> tags,
> > > however. The ISO 639 and ISO 3166 standards are not freely available
> and evolve
> > > over time.
> >
> > Accessibility has not been a problem for this implementor...
>
> I agree with Bruce, that accessibility of ISO 639 and ISO 3166 has not
> been the issue. Unfortunately, his comments do not indicate what the
> real issues were.

My comments are in response to the "New Last Call" made on
the ietf-announce list.  They are in response to the text which
accompanied that new last call and the text of
draft-phillips-langtags-08.txt dated November 2002.  The
specific claim that accessibility has been a problem was made in
the text accompanying the new last call (q.v.).  For those not
subscribed to the ietf-announce list, the text of the new last
call can be seen at
http://www1.ietf.org/mail-archive/web/ietf-announce/current/msg00755.html


> > > The largest change in the specification is that it modifies the > structure of > > > the language tag registry. Instead of having to obtain lists of > codes from five > > > separate external standards... > > > Contrary to the implicit claim, the ISO documents mentioned > > above comprise two standards (available in two languages each), > > not "five separate external standards". > > RFC 3066 made reference to ISO 639-1, ISO 639-2 and ISO 3166-1; the > proposed replacement adds ISO 15924. I would count that as four ISO > standards. Up-to-date code tables for all four are readily available.

For the purpose of implementation of validation of language-tags,
the ISO 639 list includes both the 2- and 3-character codes in a
single document.  The claim (again from text accompanying the
new last call) states that there is some difference in the draft
proposal from 3066 in that 3066 (the text alleges) requires
"lists of codes from five separate external standards" -- in fact
two lists suffice for 3066 implementations.
Â
> I think this is a serious misrepresentation of the intent of the
> proposal: the draft nowhere suggests, let alone declares, that the
> source ISO standards are irrelevant.

A poor choice of words on my part. The text and draft suggests
that only the proposed new registry should be consulted, and
the draft clearly specifies that the description of all subtags is
to be provide in English (only).

> Rather, the intent of the
> comprehensive registry is to ensure stability in IETF implementations by
> protecting them from unpredictable changes in ISO standards, such as the
> re-definition of "CS" as a country identifier not long ago.The
> denotation of identifiers listed in the registry is based on their
> definition in the ISO standards, not on an informative descriptor
> provided in the registry;

It's not clear to me that the proposal will provide protection
against the whims of politicians.  If the definition of "CS" as
a country code changes again under the proposed scheme,
how is one to determine specifically what some archived
language-tag referred to at some point in time?  I'm not
particularly concerned about that problem, as I am resigned
to instability associated with anything specified by politicians
(and that includes the UN region codes).

> and as Bruce quite clearly pointed out, those
> source standards are readily accessible. So the suggestion that
> implementers will no longer have access to French-language names from
> the source ISO standards simply is vacuous.

But if the proposed new registry's description of "CS" says
"foo" and the ISO standard code list says "bar", what's
an implementor supposed to present to a user as *the*
description associated with "CS"?

> As for concerns of Anglo-centricity, I'm sure that the authors had no
> anti-French motive, and would be open to suggestions as to how that
> could be addressed.

One possibility would be two description fields.  But the
registry would need a charset closer to ISO-8859-1 than
to ANSI X3.4 as currently specified.  Or an encoding
scheme.

> Surely, though, this is not a technical argument
> against the proposal.

Not purely technical, though it presents problems for
existing implementors who provide bilingual support.
Eliminating bilingual descriptions for the language,
country (and UN region) codes leaves implementors
in a quandary.

> > The ABNF in the draft permits all of the following tags which
> > are not legal per the RFC 3066 ABNF:
> > Â  Â supercalifragilisticexpialidoceus
> > Â  Â y-----
> > Â  Â x1234567890abc
> > Â  Â a123-xyz
>
> In fact, none of these is permitted by the ABNF of the draft.

ABNF from the draft:

   Language-Tag = (lang
                   *("-" extlang)
                   ["-" script]
                   ["-" region]
                   *("-" variant)
                   *("-" extension)
                   ["-" privateuse])
                   / privateuse         ; private-use tag
                   / grandfathered      ; grandfathered registrations

lang = 2*3ALPHA ; shortest ISO 639 code
/ registered-lang
extlang = 3ALPHA ; reserved for future use
script = 4ALPHA ; ISO 15924 code
region = 2ALPHA ; ISO 3166 code
/ 3DIGIT ; UN country number
variant = ALPHA (4*7alphanum) ; registered variants
/ DIGIT (3*7alphanum)
extension = singleton 1*("-" (2*8alphanum)) ; extension subtag(s)
privateuse = "x" 1*("-" (1*8alphanum)) ; private use subtag(s)
singleton = ALPHA ; single letters
; (except x, which has special meaning)
registered-lang = 4*8ALPHA ; registered language subtag
grandfathered = ALPHA *(alphanum / "-") ; grandfathered registration
alphanum = (ALPHA / DIGIT) ; letters and numbers


Note that the RFC 2234 definition of an asterisk in front of
a production (with no adjacent numbers, as is the case in
the "grandfathered" production) means zero or more
repetitions (without upper bound) of the production to the
right of the asterisk. That means that the "grandfathered"
production (which is an alternative in the Language-Tag
production) will match any of the following text tags (comments
to the right separated by a semicolon):
   x  ; ALPHA followed by zero repetitions
   xa ; ALPHA followed by one ALPHA (see alphanum)
   x- ; ALPHA followed by one HYPHEN
   supercalifragilisticexpialidoceus ; ALPHA followed by many ALPHAs
       (see alphanum) (example previously given)
   x1234567890abc ; ALPHA followed by 13 alphanums
       (as previously given)
   a123-xyz ; ALPHA followed by three DIGITs (see alphanum)
       followed by one HYPHEN followed by three ALPHAs
       (example previously given)
   y----- ; ALPHA followed by five HYPHENs (example previously
       given)

I say the ABNF from draft -08 (quoted above) allows those;
you say no.  Either you're looking at different ABNF or one
or more of us doesn't understand ABNF.  If you wish to
convince me that I don't understand it, you'll have to do
better than simply claiming that I'm wrong with no supporting
reasoning.

> > Specifically, the draft allows, and RFC 3066 disallows:
> > Â  Â subtags more than 8 octets in length
>
> This is incorrect. It was true of an earlier draft, but that was
> changed.

The "new last call" was for version -08; I downloaded it
from the URI in the new last call and copied the ABNF
above from that.  My analysis is above.  I await your
rebuttal or retraction.

> > Â  Â hyphens which do not separate subtags
> > Â  Â zero-length subtags
>
> These near-equivalent statements are incorrect. No hyphen may be
> permitted without a non-initial sub-tag, and no sub-tag can be an empty
> string.

See the "y-----" example above, based on the published
ABNF. Again, I await your rebuttal or retraction.

> > Â  Â primary tags which are not purely alphabetic
>
> This is incorrect. A primary sub-tag must be 2*3ALPHA or 4*8ALPHA, or
> "i" or "x".

See the "a123-xyz" example above (in RFC 3066 parlance,
the "a123" part is the primary tag, which clearly contains
DIGITs.  One more time, I await your rebuttal or
retraction.

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf


_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf


[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]