Re: Last Call: 'Tags for Identifying Languages' to BCP

"JFC (Jefsey) Morfin" <jefsey@xxxxxxxxxx> · Fri, 26 Aug 2005 02:55:52 +0200

On 00:40 26/08/2005, David Hopwood said:
JFC (Jefsey) Morfin wrote:
[...] Today, the common practice of nearly one billion of Internet 
users is to be able to turn off cookies to protect their anonymous 
free usage of the web. Once the Draft enters into action they will 
be imposed a conflicting privacy violation: "tell me what you read, 
I will tell you who you are": any OPES can monitor the exchange, 
extact these unambigous ASCII tags, and know (or block) what you 
read. You can call these tags in google and learn a lot about 
people. There is no proposed way to turn that personal tagging off, 
nor to encode it.

I don't know which browser you use, but in Firefox, I can configure exactly
which language tags it sends. If it were sending other information using
language tags as a covert channel (which it *could* do regardless of the
draft under discussion), I'd expect that to be treated as at least a bug,
and if it were a deliberate privacy violation, I'd expect that to cause a
big scandal.

Dear David,
the privacy problem is the "what you read, who you are" intelligence 
leak. Today langtags are not yet much used (say the W3C people in the 
WG-ltru) when compared with  what they should in XML, HTML, etc. This 
is all what this proposition is about. This proposition is to give 
_one_shot_ in a _standardised_ way the language, the script and the 
country. It uses for that ISO codes. ISO never wanted to propose such 
a code where:

ar-arab-us are texts destined the people interested in US Arabic 
community issues.
iw-hebr-ru are texts destined to people interested in Jewish Russian 
community,
etc.

When you browser accept that langtags and you pursue the relation, 
this structured information can be filtered by ISP (for police, 
censoring, intelligence gathering, etc.) to know about their users. 
It can be used for searches on a large scale in search engines to 
know the mail you responded, etc. I suppose that in most of the world 
countries this is subject to privacy laws. I think that in France it 
is subject to the anti-racist law (the one used against Yahoo a few years ago).

The problem is that there is no way for the _receiver_to turn it 
down. This is potentially dangerous spam: it is a digital information 
I never asked for, which discloses information on me.

Is that a reason why to kill the Draft? I do not think so, but it 
certainly shows the complexity of the issue - and the lack of 
preparation of the Draft (I proposed the Security section to better 
warn about the problem). IETF proposes a solution: it is the OPES. An 
OPES on the host side can remove the langtags or to encrypt them at 
the request of the reader, without a change on the host. I tried to 
make the WG-ltru understand that not considering/reminding OPES at 
the same time as documenting langtags is criminal.

This is why the default proposition I make (the Draft's ABNF and 
system being considered as a starting default proposition, and hooks 
open to IRI Tags adapted to each situation at the decision of the 
user or of services he trusts).

Let take the case above. A service provider can propose an OPES 
service, changing "he-hebr-us" into "x-abcf" and an internal OPES 
plug-in to the user to restore x-abcf into he-hebr-us, so his 
libraries work. And mani L9 organisations/Governments are satisfied. 
He can even provide dynamically updated langtag aliases. However, a 
good service should warranty the service is conflict free. This is no 
problem if the langtag alias is x-service.com:abcf (conforming with 
URI Tag RFC), but this is forbidden by the Draft. My proposition is 
to use "0-" has a hook to specific format, so the Draft ABNF is fully 
respected.

In that case "0-service.com:abcf will be not rise an error. And will 
not conflict with the people using the default format (the format 
proposed by the Draft). The interest of "0-" is that it can be 
multilingual, so the hook can work in ASCII but also in punycode, and 
in any script. It can also be entirerly numeric and possibly refer 
directly to an IPv6 address, making the scheme DN independent.

I support it as a transition standard track RFC needed by some, 
as long as it does not exclude more specific/advanced language 
identification formats, processes or future IANA or ISO 11179 
conformant registries.

The grammar defined in the draft is already flexible enough.
(I suppose you mean more than just grammar. Talking of the ABNF is 
probably clearer?).
I am certainly eager to learn how I can support modal information 
(type of voice, accent, signs, icons, feelings, fount, etc.), 
medium information, language references (for example is it plain, 
basic, popular English? used dictionary, used software publisher), 
nor the context (style, relation, etc.), nor the nature of the text 
(mono, multilingual, human or machine oriented - for example what 
is the tag to use for a multilingual file [printed in a language of 
choice]), the date of the langtag version being used, etc.

I mean that the grammar is flexible enough to encode any of the 
above attributes (not that it would be useful or a good idea to encode most
of them).

hmmm.... you take the responsibility of both declarations :-)
- that you _can_ encode it. But you do not provide examples.
- that it would not be useful or a good idea to encode basic content 
object attributes.

The Draft has introduced the "script" subtag in addition to RFC 
3066 (what is an obvious change). However in order to stay 
"compatible" with RFC 3066, author says it cannot introduce a 
specific support of URI tags.

This objection seems to be correct: URI tags include characters not 
allowed by RFC 3066.

Then? The purpose of this work is to address the limitations of RFC 
3066. URI tags did not exist when RFC 3066 was written. Do you mean 
for example that langtags are to be ASCII only because RFC 3066 was ASCII only?

 But you could easily encode the equivalent information to an URI 
tag, if you wanted to.

please document how do you do, while respecting the hybrid format of 
the proposed ABNF where information is not indentified by fixed 
position, but also relative position and size, with "-" as sole 
separator. And they want to keep labels between "-" 8 characters 
long. Tell me how you support IDNs.

Let suppose that I have "lang-tags.org:" as a scheme.
or "xn--abcdef.com:". Tell me how you support them
jfc

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf