On 11/16/2012 05:12 PM, Nick Kew wrote:
Sorry for the delay on this. The basic problem remains: If I enable html rewriting and connect with a client requesting content compression the reverse proxy will fail with a message pointing at libxml2/encoding. I can also see different log entries depending on whether I set the charset of the page.On Fri, 16 Nov 2012 11:31:38 +0100 Thomas Eckert<Thomas.Eckert@xxxxxxxxxx> wrote:Thanks for the hint but unfortunately "manually" adding xml2enc to the filtering chain does not help.Looks like you've got problems over and above anything to do with your configuration!"SetOutputFilter INFLATE;proxy-html" gets the page displayed correctlyI thought you said it had charset issues?[pid 15039:tid 3007834992] mod_xml2enc.c(259): [client 10.10.10.10:40388] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlateThat seems implausible. How do you get a libxml2 install that doesn't natively support ISO-8859-1 (latin1)?[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40388] AH01441: xml2enc: converted 1/1 bytes [pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping invalid byte(s) in input stream! (and more conversion errors)It looks as if your backend incorrectly identifies the charset of the page in question. Either that or you found a bug. Do you have a URL where your unprocessed page could be viewed?
So if I just send the page with "Content-Type: text/html" this is what I getmod_deflate.c(1283): [client 10.10.10.10:39771] AH01398: Zlib: Inflated 348 to 682 : URL / mod_xml2enc.c(183): [client 10.10.10.10:39771] AH01430: Content-Type is text/html mod_xml2enc.c(259): [client 10.10.10.10:39771] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc: consuming 682 bytes from bucket mod_xml2enc.c(490): [client 10.10.10.10:39771] AH01441: xml2enc: converted 682/682 bytes mod_deflate.c(763): [client 10.10.10.10:39771] AH01384: Zlib: Compressed 668 to 344 : URL / mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc: consuming 10 bytes from bucket [client 10.10.10.10:39771] xml2enc_html_entity_fixups(): Transcoder failure (rv=-2) mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] AH01441: xml2enc: converted 1/1 bytes [client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(490): [client 10.10.10.10:39771] AH01441: xml2enc: converted 9/8 bytes mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc: consuming 344 bytes from bucket [client 10.10.10.10:39771] xml2enc_html_entity_fixups(): Transcoder failure (rv=-2) mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] AH01441: xml2enc: converted 4/4 bytes [client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] AH01441: xml2enc: converted 4/3 bytes [client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] AH01441: xml2enc: converted 1/0 bytes [client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(481): [client 10.10.10.10:39771] AH01440: xml2enc: reinserting 334 unconsumed bytes from bucket [client 10.10.10.10:39771] AH01385: Zlib error -2 flushing zlib output buffer ((null))
But if "Content-Type: text/html; charset=ISO-8859-1" is sent this is what I get
mod_deflate.c(1283): [client 10.10.10.10:40040] AH01398: Zlib: Inflated 348 to 682 : URL / mod_xml2enc.c(183): [client 10.10.10.10:40040] AH01430: Content-Type is text/html;charset=ISO-8859-1
[client 10.10.10.10:40040] AH01431: Got charset ISO-8859-1 from HTTP headersmod_deflate.c(763): [client 10.10.10.10:40040] AH01384: Zlib: Compressed 668 to 344 : URL / mod_xml2enc.c(463): [client 10.10.10.10:40040] AH01439: xml2enc: consuming 10 bytes from bucket [client 10.10.10.10:40040] xml2enc_html_entity_fixups(): Transcoder failure (rv=-2) mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] AH01441: xml2enc: converted 1/1 bytes [client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(490): [client 10.10.10.10:40040] AH01441: xml2enc: converted 9/8 bytes mod_xml2enc.c(463): [client 10.10.10.10:40040] AH01439: xml2enc: consuming 344 bytes from bucket [client 10.10.10.10:40040] xml2enc_html_entity_fixups(): Transcoder failure (rv=-2) mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] AH01441: xml2enc: converted 4/4 bytes [client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] AH01441: xml2enc: converted 4/3 bytes [client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] AH01441: xml2enc: converted 1/0 bytes [client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(481): [client 10.10.10.10:40040] AH01440: xml2enc: reinserting 334 unconsumed bytes from bucket
From what I can tell, this still seems to be the "wrong" processing as the page cannot be inflated correctly at the user's end. Nevertheless the message
AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlatedoes not show up anymore. Looking at mod_xml2enc.c +185-194 and +251-268 that makes sense but would imply the enc detection in +198-206 failed. I suggest adding some sort of "failed" debug message in case xmlDetectCharEncoding() didn't work as desired.
I've tried a couple more combinations, including using mod_charset_lite and different non-latin1 encodings on the backend, but the only thing that works is using the Header directive on the backend to set "Content-Type: text/html; charset=UTF-8" while leaving the actual contents unchanged. Here, "works" means the page is displayed correctly at the client's end.
The goal is still to get mod_proxy_html to rewrite the html just like it would to with "ProxyHTMLEnable On" but at the same time retaining compression support. So setting
SetOutputFilter INFLATE;proxy-html which "drops out" the "xml2enc" filter might be problematic.Unfortunately, the page is not accessible publicly. It is rather simply, though, and I made sure there is nothing 'special' on that page - e.g. it's just plain ascii, no meta tags, etc.
Note, I tried both "ProxyHTMLEnable On" and "SetOutputFilter INFLATE;proxy-html" as filter directives for all above mentioned setups. Neither worked except with the mentioned forced UTF-8 header.
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx