[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: MHonArc and multi-byte characters in HTML
Hmm..., what I want to know is just if converting to
EUC(-JP) or UTF-8 solve MHonArc's problem (e.g.,
<TITLE>$SUBJECTNA:72$</TITLE>) or not.
In other words, I just want to know which problems will be
solved by your patch and which problems will not.
That's all.
(And I believe we should make this clear before applying the
patch to the MHonArc.)
Anyway,
From: Koichi Nakatani <nakatani@konica.co.jp>
Subject: Re: MHonArc and multi-byte characters in HTML
Date: Tue, 02 Oct 2001 08:50:36 +0900
>>> Instead of answering your question, I would like to ask you a question.
>>> What should be the correct "charset" parameter of HTML files generated
>>> by MHonArc?
>> Sorry but I don't understand what you mean.
>> More precisely, I don't understand the relation between my
>> question and yours.
>> In fact, what I'm talking about is NOT the charset of HTML
>> messages but how treat (process) multi-byte characters in
>> MHonArc.
Possibly I have a misunderstanding about `chimera' state
(I'm sure this does not mean the name of WWW browser ;-),
> If you want to understand what I mean, you should understand the relationship
> between charset of HTML messages and how to treat multi-byte characters in
> MHonArc.
> HTML generators like MHonArc are responsible to provide a mean to avoid
> character encoding chimera state in HTML files.
Why?
The charset in original (RFC822) message does not work?
Of course I know few browsers support, for example,
iso-2022-jp-2 and converting to UTF-8 may help in such
situation.
But I cannot understand why converting to EUC-JP avoids
character encoding chimera state in HTML files.
Or your `chimera' state means the following situation?
| Subject: =?iso-8859-1?Q?......?=
| Content-Type: Text/Plain; charset=iso-2022-jp
You know, converting to EUC-JP doesn't help in such cases,
either.
> Practically, that means MHonArc is responsible to provide a mean to
> generate UTF-8 files on user's choice.
I agree, and I've NEVER negated this, eh?
But I still don't understand the relationship between
charset of HTML messages and how to treat multi-byte
characters in MHonArc.
What I want to say is, if we want MHonArc to process
multi-byte characters like iso-2022-jp{,-2} (including
UTF-8) correctly, we need another functionality, for
example, by enhancing lib/iso2022jp.pl.
To put it concretely, I think we need some fuctions like
splitting multi-byte char strings appropriately etc.
(and I've been planning to write such codes for a long time
but don't have enough time...).
--
Takashi P.KATOH
[Index of Archives]
[Bugtraq]
[Yosemite News]
[Mhonarc Home]