[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MHonArc and multi-byte characters in HTML



On October 6, 2001 at 16:28, Greg Matheson wrote:

> > I'm unsure how to deal with the string clipping issue with respect to
> > resource variables: e.g. $SUBJECT:72$.  I see this a fundamental issue
> > with Perl itself since there is no built-in string type that abstracts
> > this problem (like strings in Java) in a simple and efficient matter,
> > yet.  An approach that would ignore the problem but make sure nothing
> > bad happens is to change all default resources settings to not using
> > the clipping support in resource variables.  Therefore, any clipping
> > must be explicitly specified under the advisory of the problems that
> > multi-byte character encodings may cause.  I believe I will go make
> > this kind of change to default resource settings for v2.5.
> 
> the only effect would be on half a character, which is minimal

Not necessarily.  The effect can be multiple characters depending
on the encoding that is used.  For example, for variable-width encodings,
if the clip occurs on a byte that denotes a shift, all data that follows
can be affected.  Remember, resource variables are just a part of the
entire character stream.  The text after the resource variable
can be affected in how it gets rendered.

> and would even indicate the variable had been clipped, so I 
> think disabling clipping in the case of multi-byte character
> encodings would be a worse cure than the disease. 

Where did I state anything would be disabled.  All that I stated is
that default resource values would be changed to not use clipping
in any resource variables.  Users will still be free to use it
if they know it is not an issue for their message data.  Note,
I believe MSGPGBEGIN is the only resource with a default value
that uses resource variable clipping: $SUBJECT:72$.

Plus, there is currently no easy way to tell what the actual encoding
is in effect for a given "string", so disabling is not possible.  If
it was, then the problem would not be a problem since if the encoding
was known, the clipping code could be smart to take the encoding into
account.

--ewh


[Index of Archives]     [Bugtraq]     [Yosemite News]     [Mhonarc Home]