[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to avoid auto-linking in non-ascii URLs

Hi, thank you for your quick reply.

In <200603221712.k2MHC9230101@xxxxxxxxxxxxxxxxxx>,
 earl@xxxxxxxxxxxx wrote:
> On March 23, 2006 at 01:36, Masao Takaku wrote:
> > MHonArc outputs links of URL-like strings automatically.
> > When a message includes a string "See http://www.example.com/foo/bar/";,
> > MHonARC process this as follows;
> > 
> > See <a href="http://www.example.com/foo/bar/";>http://www.example.com/foo/bar/
> > </a>
> > 
> > It works well, but in case of an URL-like string followed by non-ASCII
> > text without space, this feature is not usefull;
> > e.g. "http://www.example.com/foo/bar/を見て.";, which means
> > "See http://www.example.com/foo/bar/"; in Japanese, goes to as follows:
> > 
> > <a href="http://www.example.com/foo/bar/&#x3092;&#x898B;&#x3066";>http://www.e
> > xample.com/foo/bar/&#x3092;&#x898B;&#x3066</a>;.
> > 
> > In this example, the outputs should be like the following:
> > 
> > <a href="http://www.example.com/foo/bar/";>http://www.example.com/foo/bar/</a>
> > &#x3092;&#x898B;&#x3066;.
> > 
> > My environment is Perl-5.8.0 and MHonArc-2.6.15 (default setting).
> > 
> > Does anyone know how to do this, or any workarounds?
> First, you may want to check out <http://www.mhonarc.jp/> for
> Japanese-specific usage information MHonArc.  There should also
> be links to a Japanese-based mailing list which may be useful.

<http://www.mhonarc.jp/2.6.x/iso2022jp.html#summary>, rcfile for
ISO-2022-JP encoding, is a good resouce and works fine.
Using the resouce settings based on ISO-2022-JP, URL-linking has
limited only for non-ASCII text. This seems to be workaround for my

> As for your specific problem, you may need disable URL linking.
> This can be done by specify -nourl on the command-line or
> <NOURL> in your resource file.  The '&' is a legal URL character,
> and MHonArc does not try to interpret what character entity reference
> values resolve to to determine if it should be included.

Nop... disabling URL linking is not what I have wanted.
# URL linking is almostly successful except for non-ASCII URLs.


It's true that '&' is a legal URL character, but "U+3092" is an
invalid character for URL and a numerical entity "&#x3092;" is a
equivalent to "U+3092" in HTML. And how to interpret non-ASCII-URLs
in at least Japanese encodings is very dependent on browser/server

Is this assumption also true in other languages/encodings?

If so, I think that MHonARC, even in default settings, should treat
these characters as invalid URL characters in URL linking code.

> The URL linking code is a single regex operation.
> I'm not sure at this time on what code changes could be done.
> If you go with ISO-2022-JP encoding for your archives, it may
> avoid this problem.

Masao Takaku  //  masao@xxxxxxxxx

[Index of Archives]     [Bugtraq]     [Yosemite News]     [Mhonarc Home]