Re: [patch] Apache converts GZIPed data into UTF-8 - 2nd Act

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 15, 2019 at 11:43:21PM +0100, Nick Kew wrote:

Hi Nick,

! OK, I've looked.

me too. ;)

! What I'd like to do - pass responsibility back to the module
! that inserted the xml2enc filter - calls for a minor API
! change, so isn't going to happen in 2.4.x.  A variant on
! that approach might work, but right now I don't see anything
! better than replicating mod_proxy_html's logic in mod_xml2enc
! to deal with the situation where they're interacting.
! 
! Your check on content-encoding can also looks good.
! Except that unless I'm missing something, your use of f->r->notes
! is unnecessary: ap_remove_output_filter means we don't revisit
! that code!

Yes, it were unnecessary, but for a different reason: my code is
currently not at the proper place.
Given a chain DEFLATE;XML2ENC;INFLATE it looks like this:

[filter:trace4] [pid 77874] util_expr_eval.c(858): [client 192.168.97.18:65401] Evaluation of expression from /usr/local/etc/apache24/extra/httpd-ruby.conf:126 gave: 1
[filter:trace2] [pid 77874] mod_filter.c(159): [client 192.168.97.18:65401] Expression condition for 'inflate' matched
[filter:trace4] [pid 77874] util_expr_eval.c(858): [client 192.168.97.18:65401] Evaluation of expression from /usr/local/etc/apache24/extra/httpd-ruby.conf:127 gave: 1
[filter:trace2] [pid 77874] mod_filter.c(159): [client 192.168.97.18:65401] Expression condition for 'xml2enc' matched
[xml2enc:debug] [pid 77874] mod_xml2enc.c(176): [client 192.168.97.18:65401] AH01430: Content-Type is text/css
[xml2enc:debug] [pid 77874] mod_xml2enc.c(250): [client 192.168.97.18:65401] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc: consuming 8096 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc: converted 8096/8096 bytes
[filter:trace4] [pid 77874] util_expr_eval.c(858): [client 192.168.97.18:65401] Evaluation of expression from /usr/local/etc/apache24/extra/httpd-ruby.conf:130 gave: 1
[filter:trace2] [pid 77874] mod_filter.c(159): [client 192.168.97.18:65401] Expression condition for 'deflate' matched
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc: consuming 8096 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc: converted 8096/8096 bytes
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc: consuming 8096 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc: converted 8096/8096 bytes
[deflate:debug] [pid 77874] mod_deflate.c(1622): [client 192.168.97.18:65401] AH01398: Zlib: Inflated 6176 to 28247 : URL /fin-stage/assets/application-3a5821b5be536e0108d5934c96815299001dfa3c1ddff9f39676a3a3126d8190.css
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc: consuming 3959 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc: converted 3959/3959 bytes
[deflate:debug] [pid 77874] mod_deflate.c(854): [client 192.168.97.18:65401] AH01384: Zlib: Compressed 28247 to 6226 : URL /fin-stage/assets/application-3a5821b5be536e0108d5934c96815299001dfa3c1ddff9f39676a3a3126d8190.css

Currently my snippet it is run for each of these chunks of data
(which is not a good idea, but I didn't hope to be able to understand
the code in its fullness and find a better place). So, with the
DEFLATE walking behind, when it comes to the second chunk, the
DEFLATE will already have put the "gzip" header back in, and so 
I watched xml2enc quit in the midst of the document.
Thats why I put that in.

Another minor flaw is that the test for "Content-Encoding: identity" 
(btw: does anybody use that?) is probably not case-insensitive.

And then I was thinking about a different and probably better approach: 
if we can check the first few bytes of the actual document
beforehand, we can test these against the signatures of the usual
compression-algorithms (in the same way as the "file" command does it
on Unix). This seems more safe than relying on header information.

Because, I don't see a reason why an HTML document might not also be
compressed - and then it wouldn't help to just stop processing CSS 
documents. 

Btw, concerning this message, I had a look at that one, too:
   AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate

It seems to me that this message is reached just because the document
is compressed (and libxml2 can obviousely not find a charset in
that); only the message text seems misleading.
Maybe a conservative approach would be to just stop at that point
and give up - because, compression might not be the only issue here;
people might get the idea to use some end-to-end encryption for
certain documents, and that would also appear as binary data that we
must not tamper with...
(just thinking along)

cheerio,
PMc

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx




[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux