[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

XML, control characters and MHonArc



I've recently been looking at revamping an archive and having MHonArc
output XML which is then pulled into a PHP based application using
XML_Unserialize.

Mostly this is working fine, but I have the occasional problem with
control characters in badly formatted emails. Specifically, a QP email
with the string =12 - MHonArc outputs the associated control character
to the XML. These characters are not valid in XML and the XML parser
chokes on them.

I see a quick mention of a similar problem back in 2000:
http://www.mhonarc.org/archive/html/mhonarc-users/2000-07/msg00040.html

Have things changed? Is there any way short of writing a custom filter,
or hacking/patching an existing one, that I can persuade MHonArc to
strip out XML illegal control characters?

If not, any hints on where to start hacking?

Thanks

-- 
Chris Hastie


[Index of Archives]     [Bugtraq]     [Yosemite News]     [Mhonarc Home]