On 9/30/17, Jeffrey Merkey <jeffmerkey@xxxxxxxxx> wrote: > On 9/30/17, Rafael Akchurin <rafael.akchurin@xxxxxxxxxxxx> wrote: >> Hello Jeff, >> >> Do not forget Google and YouTube are now using brotli encoding >> extensively, >> not only gzip. >> >> Best regards, >> Rafael Akchurin >> >>> Op 30 sep. 2017 om 23:49 heeft Jeffrey Merkey <jeffmerkey@xxxxxxxxx> het >>> volgende geschreven: >>> >>>> On 9/30/17, Eliezer Croitoru <eliezer@xxxxxxxxxxxx> wrote: >>>> Hey Jeffrey, >>>> >>>> What happens when you disable the next icap service this way: >>>> icap_service service_avi_resp respmod_precache >>>> icap://127.0.0.1:1344/cherokee bypass=0 >>>> adaptation_access service_avi_resp deny all >>>> >>>> Is it still the same? >>>> What I suspect is that the requests are defined to accept gzip >>>> compressed >>>> objects and the icap service is not "gnuzip" them which results in what >>>> you >>>> see. >>>> >>>> To make sure that squid is not at fault here try to disable both icap >>>> services and then add then one at a time and see which of this triangle >>>> is >>>> giving you trouble. >>>> I enhanced an ICAP library which is written in GoLang at: >>>> https://github.com/elico/icap >>>> >>>> And I have couple examples on how to work with http requests and >>>> responses >>>> at: >>>> https://github.com/andybalholm/redwood/ >>>> https://github.com/andybalholm/redwood/search?utf8=%E2%9C%93&q=gzip&type= >>>> >>>> Let me know if you need help finding out the issue. >>>> >>>> All The Bests, >>>> Eliezer >>>> >>>> ---- >>>> Eliezer Croitoru >>>> Linux System Administrator >>>> Mobile: +972-5-28704261 >>>> Email: eliezer@xxxxxxxxxxxx >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: squid-users [mailto:squid-users-bounces@xxxxxxxxxxxxxxxxxxxxx] On >>>> Behalf Of Jeffrey Merkey >>>> Sent: Saturday, September 30, 2017 23:28 >>>> To: squid-users <squid-users@xxxxxxxxxxxxxxxxxxxxx> >>>> Subject: SSL Bump Failures with Google and Wikipedia >>>> >>>> Hello All, >>>> >>>> I have been working with the squid server and icap and I have been >>>> running into problems with content cached from google and wikipedia. >>>> Some sites using https, such as Centos.org work perfectly with ssl >>>> bumping and I get the decrypted content as html and it's readable. >>>> Other sites, such as google and wikipedia return what looks like >>>> encrypted traffic, or perhaps mime encoded data, I am not sure which. >>>> >>>> Are there cases where squid will default to direct mode and not >>>> decrypt the traffic? I am using the latest squid server 3.5.27. I >>>> really would like to get this working with google and wikipedia. I >>>> reviewed the page source code from the browser viewer and it looks >>>> nothing like the data I am getting via the icap server. >>>> >>>> Any assistance would be greatly appreciated. >>>> >>>> The config I am using is: >>>> >>>> # >>>> # Recommended minimum configuration: >>>> # >>>> >>>> # Example rule allowing access from your local networks. >>>> # Adapt to list your (internal) IP networks from where browsing >>>> # should be allowed >>>> >>>> acl localnet src 127.0.0.1 >>>> acl localnet src 10.0.0.0/8 # RFC1918 possible internal network >>>> acl localnet src 172.16.0.0/12 # RFC1918 possible internal network >>>> acl localnet src 192.168.0.0/16 # RFC1918 possible internal network >>>> acl localnet src fc00::/7 # RFC 4193 local private network range >>>> acl localnet src fe80::/10 # RFC 4291 link-local (directly >>>> plugged) machines >>>> >>>> acl SSL_ports port 443 >>>> acl Safe_ports port 80 # http >>>> acl Safe_ports port 21 # ftp >>>> acl Safe_ports port 443 # https >>>> acl Safe_ports port 70 # gopher >>>> acl Safe_ports port 210 # wais >>>> acl Safe_ports port 1025-65535 # unregistered ports >>>> acl Safe_ports port 280 # http-mgmt >>>> acl Safe_ports port 488 # gss-http >>>> acl Safe_ports port 591 # filemaker >>>> acl Safe_ports port 777 # multiling http >>>> acl CONNECT method CONNECT >>>> >>>> # >>>> # Recommended minimum Access Permission configuration: >>>> # >>>> # Deny requests to certain unsafe ports >>>> http_access deny !Safe_ports >>>> >>>> # Deny CONNECT to other than secure SSL ports >>>> http_access deny CONNECT !SSL_ports >>>> >>>> # Only allow cachemgr access from localhost >>>> http_access allow localhost manager >>>> http_access deny manager >>>> >>>> # We strongly recommend the following be uncommented to protect >>>> innocent >>>> # web applications running on the proxy server who think the only >>>> # one who can access services on "localhost" is a local user >>>> #http_access deny to_localhost >>>> >>>> # >>>> # INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS >>>> # >>>> >>>> # Example rule allowing access from your local networks. >>>> # Adapt localnet in the ACL section to list your (internal) IP networks >>>> # from where browsing should be allowed >>>> http_access allow localnet >>>> http_access allow localhost >>>> >>>> # And finally deny all other access to this proxy >>>> http_access deny all >>>> >>>> # Squid normally listens to port 3128 >>>> #http_port 3128 >>>> >>>> # Uncomment and adjust the following to add a disk cache directory. >>>> #cache_dir ufs /usr/local/squid/var/cache/squid 100 16 256 >>>> >>>> # Leave coredumps in the first cache dir >>>> coredump_dir /usr/local/squid/var/cache/squid >>>> >>>> # >>>> # Add any of your own refresh_pattern entries above these. >>>> # >>>> refresh_pattern ^ftp: 1440 20% 10080 >>>> refresh_pattern ^gopher: 1440 0% 1440 >>>> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 >>>> refresh_pattern . 0 20% 4320 >>>> >>>> http_port 3128 ssl-bump generate-host-certificates=on >>>> dynamic_cert_mem_cache_size=4MB cert=/etc/squid/ssl_cert/myCA.pem >>>> http_port 3129 >>>> >>>> # SSL Bump Config >>>> always_direct allow all >>>> ssl_bump server-first all >>>> sslproxy_cert_error deny all >>>> sslproxy_flags DONT_VERIFY_PEER >>>> sslcrtd_program /usr/local/squid/libexec/ssl_crtd -s /var/lib/ssl_db >>>> -M 4MB sslcrtd_children 8 startup=1 idle=1 >>>> >>>> # For squid 3.5.x >>>> #sslcrtd_program /usr/local/squid/libexec/ssl_crtd -s /var/lib/ssl_db >>>> -M >>>> 4MB >>>> >>>> # For squid 4.x >>>> # sslcrtd_program /usr/local/squid/libexec/security_file_certgen -s >>>> /var/lib/ssl_db -M 4MB >>>> >>>> icap_enable on >>>> icap_send_client_ip on >>>> icap_send_client_username on >>>> icap_client_username_header X-Authenticated-User >>>> icap_preview_enable on >>>> icap_preview_size 1024 >>>> icap_service service_avi_req reqmod_precache >>>> icap://127.0.0.1:1344/request >>>> bypass=1 >>>> adaptation_access service_avi_req allow all >>>> >>>> icap_service service_avi_resp respmod_precache >>>> icap://127.0.0.1:1344/cherokee bypass=0 >>>> adaptation_access service_avi_resp allow all >>>> >>>> Jeff >>>> _______________________________________________ >>>> squid-users mailing list >>>> squid-users@xxxxxxxxxxxxxxxxxxxxx >>>> http://lists.squid-cache.org/listinfo/squid-users >>>> >>>> >>> >>> Eliezer, >>> >>> Well, you certainly hit the nail on the head. I added the following >>> code to check the content being sent to the icap server from squid, >>> and here is what I found when I check the headers being sent from the >>> remote web server: >>> >>> Code to check for content type and encoding received by the icap >>> server added to c-icap: >>> >>> hdrs = ci_http_response_headers(req); >>> content_type = ci_headers_value(hdrs, "Content-Type"); >>> if (content_type) >>> ci_debug_printf(1,"srv_cherokee: content-type: %s\n", >>> content_type); >>> >>> content_encoding = ci_headers_value(hdrs, "Content-Encoding"); >>> if (content_encoding) >>> ci_debug_printf(1,"srv_cherokee: content-encoding: %s\n", >>> content_encoding); >>> >>> And the output from scanned pages sent over from squid: >>> >>> srv_cherokee: init request 0x7f3dbc008eb0 >>> pool hits:1 allocations: 1 >>> Allocating from objects pool object 5 >>> pool hits:1 allocations: 1 >>> Geting buffer from pool 4096:1 >>> Requested service: cherokee >>> Read preview data if there are and process request >>> srv_cherokee: content-type: text/html; charset=utf-8 >>> srv_cherokee: content-encoding: gzip <-- As you stated, I am >>> getting gzipped data >>> srv_cherokee: we expect to read :-1 body data >>> Allow 204... >>> Preview handler return allow 204 response >>> srv_cherokee: release request 0x7f3dbc008eb0 >>> Store buffer to long pool 4096:1 >>> Storing to objects pool object 5 >>> Log request to access log file /var/log/i-cap_access.log >>> >>> >>> Wikipedia at https://en.wikipedia.org/wiki/HTTP_compression describes >>> the process as: >>> >>> " ... >>> Compression scheme negotiation[edit] >>> In most cases, excluding the SDCH, the negotiation is done in two >>> steps, described in >>> RFC 2616: >>> >>> 1. The web client advertises which compression schemes it supports >>> by including a list >>> of tokens in the HTTP request. For Content-Encoding, the list in a >>> field called Accept - >>> Encoding; for Transfer-Encoding, the field is called TE. >>> >>> GET /encrypted-area HTTP/1.1 >>> Host: www.example.com >>> Accept-Encoding: gzip, deflate >>> >>> 2. If the server supports one or more compression schemes, the >>> outgoing data may be >>> compressed by one or more methods supported by both parties. If >>> this is the case, the >>> server will add a Content-Encoding or Transfer-Encoding field in >>> the HTTP response with >>> the used schemes, separated by commas. >>> >>> HTTP/1.1 200 OK >>> Date: mon, 26 June 2016 22:38:34 GMT >>> Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) >>> Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT >>> Accept-Ranges: bytes >>> Content-Length: 438 >>> Connection: close >>> Content-Type: text/html; charset=UTF-8 >>> Content-Encoding: gzip >>> >>> The web server is by no means obligated to use any compression method >>> – >>> this >>> depends on the internal settings of the web server and also may >>> depend on the internal >>> architecture of the website in question. >>> >>> In case of SDCH a dictionary negotiation is also required, which >>> may involve additional >>> steps, like downloading a proper dictionary from . >>> .." >>> >>> >>> So, it looks like it is a feature of the browser. So, is it possible >>> to have squid gunzip the data or configure the browser not to send the >>> header to remove "Accept-Encoding: gzip, deflate" from the request >>> sent to the remote server telling it to gzip the data? >>> >>> Thanks >>> >>> Jeff >>> _______________________________________________ >>> squid-users mailing list >>> squid-users@xxxxxxxxxxxxxxxxxxxxx >>> http://lists.squid-cache.org/listinfo/squid-users >> > > Well, > > After reviewing this problem and all of the great technical > information folks provided, I have it working and I figured out the > best way to deal with this transparently allowing squid to remotely > spoof the server side with modified request headers. > > Compile squid with the flags: > > --enable-http-violation > > then add the following to the squid.conf file: > > # disable remote html data compression by replacing HTTP request headers > # requires squid build option --enable-http-violations > request_header_access Accept-Encoding deny all > request_header_replace Accept-Encoding *;q=0 > > Then tell squid to strip all Accept-Encoding request headers and > substitute the string " *;q=0 " which tells the server not to send any > compressed data. I tested this with chrome which was configured to > always send Acccept-Encoding:gzip,deflate and all of the C-ICAP data I > am seeing is plain text html, which is what I wanted. > > So adding those two header instructions transparently spoofs the > remote server into always sending uncompressed data. I did note that > google has some nonsense going on with chrome (mozilla does not do > this) so that even if chrome request headers have been spoofed by > squid specifying no gzip or deflate, the google servers will still > send a content-encoding:gzip response header on any responses which do > not contain any data (???), which is clearly a bug of some sort on > HTTP responses . > > The browser treats the data as plain text and works correctly, even > though it gets a content-encoding:gzip header. It only seems to do > this on HTTP header only payload request/responses which have no body > text. Responses which actually contain body data are missing the > content-encoding header, which is what I wanted to see happen. > > So to summarize, the above changes will enable squid to filter and > spoof Accept-Encoding: HTTP requests to tell the server to always send > uncompressed data, and from my testing it works transparently on > Chrome and Mozilla, with C-ICAP getting the uncompressed, unencrypted > data it needs. > > Jeff > One caveat about this I discovered that there are quite a few websites which completely ignore the Accept-Encoding request header and just go ahead and send gzip html data even when you tell it not to. Oh well, back to the drawing board. Jeff _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users