Re: Last Call: 'Tags for Identifying Languages' to BCP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 00:40 26/08/2005, David Hopwood said:
JFC (Jefsey) Morfin wrote:
[...] Today, the common practice of nearly one billion of Internet users is to be able to turn off cookies to protect their anonymous free usage of the web. Once the Draft enters into action they will be imposed a conflicting privacy violation: "tell me what you read, I will tell you who you are": any OPES can monitor the exchange, extact these unambigous ASCII tags, and know (or block) what you read. You can call these tags in google and learn a lot about people. There is no proposed way to turn that personal tagging off, nor to encode it.

I don't know which browser you use, but in Firefox, I can configure exactly
which language tags it sends. If it were sending other information using
language tags as a covert channel (which it *could* do regardless of the
draft under discussion), I'd expect that to be treated as at least a bug,
and if it were a deliberate privacy violation, I'd expect that to cause a
big scandal.

Dear David,
the privacy problem is the "what you read, who you are" intelligence leak. Today langtags are not yet much used (say the W3C people in the WG-ltru) when compared with what they should in XML, HTML, etc. This is all what this proposition is about. This proposition is to give _one_shot_ in a _standardised_ way the language, the script and the country. It uses for that ISO codes. ISO never wanted to propose such a code where:

ar-arab-us are texts destined the people interested in US Arabic community issues. iw-hebr-ru are texts destined to people interested in Jewish Russian community,
etc.

When you browser accept that langtags and you pursue the relation, this structured information can be filtered by ISP (for police, censoring, intelligence gathering, etc.) to know about their users. It can be used for searches on a large scale in search engines to know the mail you responded, etc. I suppose that in most of the world countries this is subject to privacy laws. I think that in France it is subject to the anti-racist law (the one used against Yahoo a few years ago).

The problem is that there is no way for the _receiver_to turn it down. This is potentially dangerous spam: it is a digital information I never asked for, which discloses information on me.

Is that a reason why to kill the Draft? I do not think so, but it certainly shows the complexity of the issue - and the lack of preparation of the Draft (I proposed the Security section to better warn about the problem). IETF proposes a solution: it is the OPES. An OPES on the host side can remove the langtags or to encrypt them at the request of the reader, without a change on the host. I tried to make the WG-ltru understand that not considering/reminding OPES at the same time as documenting langtags is criminal.

This is why the default proposition I make (the Draft's ABNF and system being considered as a starting default proposition, and hooks open to IRI Tags adapted to each situation at the decision of the user or of services he trusts).

Let take the case above. A service provider can propose an OPES service, changing "he-hebr-us" into "x-abcf" and an internal OPES plug-in to the user to restore x-abcf into he-hebr-us, so his libraries work. And mani L9 organisations/Governments are satisfied. He can even provide dynamically updated langtag aliases. However, a good service should warranty the service is conflict free. This is no problem if the langtag alias is x-service.com:abcf (conforming with URI Tag RFC), but this is forbidden by the Draft. My proposition is to use "0-" has a hook to specific format, so the Draft ABNF is fully respected.

In that case "0-service.com:abcf will be not rise an error. And will not conflict with the people using the default format (the format proposed by the Draft). The interest of "0-" is that it can be multilingual, so the hook can work in ASCII but also in punycode, and in any script. It can also be entirerly numeric and possibly refer directly to an IPv6 address, making the scheme DN independent.

I support it as a transition standard track RFC needed by some, as long as it does not exclude more specific/advanced language identification formats, processes or future IANA or ISO 11179 conformant registries.

The grammar defined in the draft is already flexible enough.
(I suppose you mean more than just grammar. Talking of the ABNF is probably clearer?). I am certainly eager to learn how I can support modal information (type of voice, accent, signs, icons, feelings, fount, etc.), medium information, language references (for example is it plain, basic, popular English? used dictionary, used software publisher), nor the context (style, relation, etc.), nor the nature of the text (mono, multilingual, human or machine oriented - for example what is the tag to use for a multilingual file [printed in a language of choice]), the date of the langtag version being used, etc.

I mean that the grammar is flexible enough to encode any of the above attributes (not that it would be useful or a good idea to encode most
of them).

hmmm.... you take the responsibility of both declarations :-)
- that you _can_ encode it. But you do not provide examples.
- that it would not be useful or a good idea to encode basic content object attributes.

The Draft has introduced the "script" subtag in addition to RFC 3066 (what is an obvious change). However in order to stay "compatible" with RFC 3066, author says it cannot introduce a specific support of URI tags.

This objection seems to be correct: URI tags include characters not allowed by RFC 3066.

Then? The purpose of this work is to address the limitations of RFC 3066. URI tags did not exist when RFC 3066 was written. Do you mean for example that langtags are to be ASCII only because RFC 3066 was ASCII only?

But you could easily encode the equivalent information to an URI tag, if you wanted to.

please document how do you do, while respecting the hybrid format of the proposed ABNF where information is not indentified by fixed position, but also relative position and size, with "-" as sole separator. And they want to keep labels between "-" 8 characters long. Tell me how you support IDNs.

Let suppose that I have "lang-tags.org:" as a scheme.
or "xn--abcdef.com:". Tell me how you support them
jfc



_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]