Re: MHonArc and multi-byte characters in HTML

"Takashi P.KATOH" <p-katoh@shiratori.riec.tohoku.ac.jp> · Tue, 02 Oct 2001 11:20:13 +0900 (JST)

Hmm..., what I want to know is just if converting to
EUC(-JP) or UTF-8 solve MHonArc's problem (e.g.,
<TITLE>$SUBJECTNA:72$</TITLE>) or not.

In other words, I just want to know which problems will be
solved by your patch and which problems will not.

That's all.
(And I believe we should make this clear before applying the
patch to the MHonArc.)

Anyway,

From: Koichi Nakatani <nakatani@konica.co.jp>
Subject: Re: MHonArc and multi-byte characters in HTML
Date: Tue, 02 Oct 2001 08:50:36 +0900
>>> Instead of answering your question, I would like to ask you a question.
>>> What should be the correct "charset" parameter of HTML files generated
>>> by MHonArc?
>> Sorry but I don't understand what you mean.
>> More precisely, I don't understand the relation between my
>> question and yours.
>> In fact, what I'm talking about is NOT the charset of HTML
>> messages but how treat (process) multi-byte characters in
>> MHonArc.

Possibly I have a misunderstanding about `chimera' state
(I'm sure this does not mean the name of WWW browser ;-),

> If you want to understand what I mean, you should understand the relationship
> between charset of HTML messages and how to treat multi-byte characters in
> MHonArc.
>   HTML generators like MHonArc are responsible to provide a mean to avoid
> character encoding chimera state in HTML files.  

Why?
The charset in original (RFC822) message does not work?
Of course I know few browsers support, for example,
iso-2022-jp-2 and converting to UTF-8 may help in such
situation.

But I cannot understand why converting to EUC-JP avoids
character encoding chimera state in HTML files.

Or your `chimera' state means the following situation?
| Subject: =?iso-8859-1?Q?......?=
| Content-Type: Text/Plain; charset=iso-2022-jp

You know, converting to EUC-JP doesn't help in such cases,
either.

>   Practically, that means MHonArc is responsible to provide a mean to
> generate UTF-8 files on user's choice.

I agree, and I've NEVER negated this, eh?

But I still don't understand the relationship between
charset of HTML messages and how to treat multi-byte
characters in MHonArc.

What I want to say is, if we want MHonArc to process
multi-byte characters like iso-2022-jp{,-2} (including
UTF-8) correctly, we need another functionality, for
example, by enhancing lib/iso2022jp.pl.

To put it concretely, I think we need some fuctions like
splitting multi-byte char strings appropriately etc.
(and I've been planning to write such codes for a long time
but don't have enough time...).

-- 
Takashi P.KATOH