If it's too inconsistent to automate, there's no really good solution.
On 12 Dec 2010, at 18:57, David Lane wrote:
> Hello,
>
> I'm looking for a way to handle a body of existing HTML which is encoded inconsistently. On the basis that the meta tag is likely to be correct, I'd like to use that to set the HTTP content-type header's charset. I have Googled for solutions, and checked the module documents, and I don't see a way to do what I have in mind. I found a number of ways to set the header, which seem to be "more correct" and would be fine in a better situation (.htaccess files, changing file suffixes, etc.), but the tangle of existing content, CMS, users and maintenance makes me lean toward a server-based solution. Did I miss something obvious?
mod_xml2enc nearly does what you want: it'll sniff the encoding from the <meta>
if the server doesn't set charset. If you just chop out the libxml2 detection
(xmlParseCharEncoding) it'll do exactly what you need. Alternatively you can
use it together with a libxml2-consumer module like mod_proxy_html, which
both deals with the charset issue and offers explicit <meta http-equiv> support.
--
Nick Kew
---------------------------------------------------------------------