[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML, control characters and MHonArc



On October 5, 2007 at 08:45, Chris Hastie wrote:

> Mostly this is working fine, but I have the occasional problem with
> control characters in badly formatted emails. Specifically, a QP email
> with the string =12 - MHonArc outputs the associated control character
> to the XML. These characters are not valid in XML and the XML parser
> chokes on them.

Have you tried out the TEXTENCODE resource to see how the
control characters are handled?  If generating XML, you may
want to use TEXTENCODE to normalize all character data to UTF-8.
See manual for examples.

> I see a quick mention of a similar problem back in 2000:
> http://www.mhonarc.org/archive/html/mhonarc-users/2000-07/msg00040.html
> 
> Have things changed? Is there any way short of writing a custom filter,
> or hacking/patching an existing one, that I can persuade MHonArc to
> strip out XML illegal control characters?

Check the minimal API documented in an appendix of the manual.  There
is a callback you can register after a message has been converted.
Your callback can check for invalid characters and remove them.

--ewh

P.S. Please post you resource settings for creating XML.  Others
may be interested and it may be something to include in the docs.


[Index of Archives]     [Bugtraq]     [Yosemite News]     [Mhonarc Home]