a way toward homograph resolution ? (was "improving WG operation")

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04:43 11/05/2005, Randy Presuhn said:
From: "JFC (Jefsey) Morfin" <jefsey@xxxxxxxxxx>
> To: "Hallam-Baker, Phillip" <pbaker@xxxxxxxxxxxx>
> Cc: <ietf@xxxxxxxx>
> Sent: Tuesday, May 10, 2005 5:29 PM
> Subject: RE: improving WG operation
...
> They do not not only delete. I suggest you just come to the WG-ltru where
> they have decided to document RFC 2277 charsets into RFC 3066 langtags. So
> you can enjoy charset conflicts, something you never though about, I
> presume. You cannot stop progress.
...

I guess Jefsey is upset because the WG rejected his proposal
to expand our scope to include charsets.  The ltru WG is most
emphatically *not* confusing charsets with language tags.

I am not upset :-). To the countrary I find extremely interesting that some people were able to rename charsets "scripts" in order to insert charsets into languages descriptions while claiming they dont (cf. above). Obviously they are unhappy when I expose the trick. Anyway the result is great fun: people will be prevented from accessing a page they know to read, if they do not know the language.



This cacologic however might be a good way to solve the IDN homograph issue and the phishing problem.


If we revert from those famous "scripts" to what they are, i.e. unicode partitions, hence stable and well documented charsets (http://www.unicode.org/Public/4.1.0/ucd/Scripts.txt) , using them browsers can expose the homographs not related to the page charset in IDNs, and kill the risks of phishing.

This only calls for the browsers to extract the charset, I mean the script name from the langtag, call this file, read the list of codes points in the charset/associated to the script, and display the URL accordingly, indicating the characters which are no part of the script/charset. This relieves the ccTLD/TLD Manager from responsibilities he cannot fulfil at 3+level.

There are howver still (minor) points to address:
- there are some minor disparities between the "script" name in the langtag, and the script name in the script.txt file should be reduced over time. I suppose that if this is a major issue, there will be help.
- the script.txt file is currently supported on the Unicode site. Even in caching it (92 K) it will be called everytime people will start their browser. This may therefore represent several billions of access a day.
- the WG-ltru only realy wants to address XML issues, related to old XML libraries. Some coordination with other WGs or interests could be fruitful. They plan the language tags registry to extend to scripts and to register them. I suppose other WGs could benefit from this (all those involved in a way or another with internationalisation and languages).


jfc






_______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]