Re: a way toward homograph resolution ?

"Randy Presuhn" <randy_presuhn@xxxxxxxxxxxxxx> · Tue, 10 May 2005 23:18:39 -0700

Hi -

Let it suffice for me to say that I believe the gentleman is mistaken.
I do not intend to waste additional bandwidth on this thread.
Those interested in ltru and its work will find our charter at
http://www.ietf.org/html.charters/ltru-charter.html and our archives at
http://www.ietf.org/mail-archive/web/ltru/index.html

Randy, ltru co-chair

----- Original Message ----- 
> From: "JFC (Jefsey) Morfin" <jefsey@xxxxxxxxxx>
> To: <ietf@xxxxxxxx>
> Cc: <idn@xxxxxxxxxxxx>
> Sent: Tuesday, May 10, 2005 9:08 PM
> Subject: a way toward homograph resolution ? (was "improving WG operation")
>

> On 04:43 11/05/2005, Randy Presuhn said:
> >From: "JFC (Jefsey) Morfin" <jefsey@xxxxxxxxxx>
> > > To: "Hallam-Baker, Phillip" <pbaker@xxxxxxxxxxxx>
> > > Cc: <ietf@xxxxxxxx>
> > > Sent: Tuesday, May 10, 2005 5:29 PM
> > > Subject: RE: improving WG operation
> >...
> > > They do not not only delete. I suggest you just come to the WG-ltru where
> > > they have decided to document RFC 2277 charsets into RFC 3066 langtags. So
> > > you can enjoy charset conflicts, something you never though about, I
> > > presume. You cannot stop progress.
> >...
> >
> >I guess Jefsey is upset because the WG rejected his proposal
> >to expand our scope to include charsets.  The ltru WG is most
> >emphatically *not* confusing charsets with language tags.
>
> I am not upset :-). To the countrary I find extremely interesting that some
> people were able to rename charsets "scripts" in order to insert charsets
> into languages descriptions while claiming they dont (cf. above). Obviously
> they are unhappy when I expose the trick. Anyway the result is great fun:
> people will be prevented from accessing a page they know to read, if they
> do not know the language.
>
>
> This cacologic however might be a good way to solve the IDN homograph issue
> and the phishing problem.
>
> If we revert from those famous "scripts" to what they are, i.e. unicode
> partitions, hence stable and well documented charsets
> (http://www.unicode.org/Public/4.1.0/ucd/Scripts.txt) , using them browsers
> can expose the homographs not related to the page charset in IDNs, and kill
> the risks of phishing.
>
> This only calls for the browsers to extract the charset, I mean the script
> name from the langtag, call this file, read the list of codes points in the
> charset/associated to the script, and display the URL accordingly,
> indicating the characters which are no part of the script/charset. This
> relieves the ccTLD/TLD Manager from responsibilities he cannot fulfil at
> 3+level.
>
> There are howver still (minor) points to address:
> - there are some minor disparities between the "script" name in the
> langtag, and the script name in the script.txt file should be reduced over
> time. I suppose that if this is a major issue, there will be help.
> - the script.txt file is currently supported on the Unicode site. Even in
> caching it (92 K) it will be called everytime people will start their
> browser. This may therefore represent several billions of access a day.
> - the WG-ltru only realy wants to address XML issues, related to old XML
> libraries. Some coordination with other WGs or interests could be fruitful.
> They plan the language tags registry to extend to scripts and to register
> them. I suppose other WGs could benefit from this (all those involved in a
> way or another with internationalisation and languages).
>
> jfc
>
>
>
>
>
>
> _______________________________________________
> Ietf mailing list
> Ietf@xxxxxxxx
> https://www1.ietf.org/mailman/listinfo/ietf

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf