Search squid archive

Re: SSL Bump Failures with Google and Wikipedia [SOLVED]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/30/17, Jeffrey Merkey <jeffmerkey@xxxxxxxxx> wrote:
> On 9/30/17, Rafael Akchurin <rafael.akchurin@xxxxxxxxxxxx> wrote:
>> Hello Jeff,
>>
>> Do not forget Google and YouTube are now using brotli encoding
>> extensively,
>> not only gzip.
>>
>> Best regards,
>> Rafael Akchurin
>>
>>> Op 30 sep. 2017 om 23:49 heeft Jeffrey Merkey <jeffmerkey@xxxxxxxxx> het
>>> volgende geschreven:
>>>
>>>> On 9/30/17, Eliezer Croitoru <eliezer@xxxxxxxxxxxx> wrote:
>>>> Hey Jeffrey,
>>>>
>>>> What happens when you disable the next icap service this way:
>>>> icap_service service_avi_resp respmod_precache
>>>> icap://127.0.0.1:1344/cherokee bypass=0
>>>> adaptation_access service_avi_resp deny all
>>>>
>>>> Is it still the same?
>>>> What I suspect is that the requests are defined to accept gzip
>>>> compressed
>>>> objects and the icap service is not "gnuzip" them which results in what
>>>> you
>>>> see.
>>>>
>>>> To make sure that squid is not at fault here try to disable both icap
>>>> services and then add then one at a time and see which of this triangle
>>>> is
>>>> giving you trouble.
>>>> I enhanced an ICAP library which is written in GoLang at:
>>>> https://github.com/elico/icap
>>>>
>>>> And I have couple examples on how to work with http requests and
>>>> responses
>>>> at:
>>>> https://github.com/andybalholm/redwood/
>>>> https://github.com/andybalholm/redwood/search?utf8=%E2%9C%93&q=gzip&type=
>>>>
>>>> Let me know if you need help finding out the issue.
>>>>
>>>> All The Bests,
>>>> Eliezer
>>>>
>>>> ----
>>>> Eliezer Croitoru
>>>> Linux System Administrator
>>>> Mobile: +972-5-28704261
>>>> Email: eliezer@xxxxxxxxxxxx
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: squid-users [mailto:squid-users-bounces@xxxxxxxxxxxxxxxxxxxxx] On
>>>> Behalf Of Jeffrey Merkey
>>>> Sent: Saturday, September 30, 2017 23:28
>>>> To: squid-users <squid-users@xxxxxxxxxxxxxxxxxxxxx>
>>>> Subject:  SSL Bump Failures with Google and Wikipedia
>>>>
>>>> Hello All,
>>>>
>>>> I have been working with the squid server and icap and I have been
>>>> running into problems with content cached from google and wikipedia.
>>>> Some sites using https, such as Centos.org work perfectly with ssl
>>>> bumping and I get the decrypted content as html and it's readable.
>>>> Other sites, such as google and wikipedia return what looks like
>>>> encrypted traffic, or perhaps mime encoded data, I am not sure which.
>>>>
>>>> Are there cases where squid will default to direct mode and not
>>>> decrypt the traffic?  I am using the latest squid server 3.5.27.  I
>>>> really would like to get this working with google and wikipedia.  I
>>>> reviewed the page source code from the browser viewer and it looks
>>>> nothing like the data I am getting via the icap server.
>>>>
>>>> Any assistance would be greatly appreciated.
>>>>
>>>> The config I am using is:
>>>>
>>>> #
>>>> # Recommended minimum configuration:
>>>> #
>>>>
>>>> # Example rule allowing access from your local networks.
>>>> # Adapt to list your (internal) IP networks from where browsing
>>>> # should be allowed
>>>>
>>>> acl localnet src 127.0.0.1
>>>> acl localnet src 10.0.0.0/8     # RFC1918 possible internal network
>>>> acl localnet src 172.16.0.0/12  # RFC1918 possible internal network
>>>> acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
>>>> acl localnet src fc00::/7       # RFC 4193 local private network range
>>>> acl localnet src fe80::/10      # RFC 4291 link-local (directly
>>>> plugged) machines
>>>>
>>>> acl SSL_ports port 443
>>>> acl Safe_ports port 80          # http
>>>> acl Safe_ports port 21          # ftp
>>>> acl Safe_ports port 443         # https
>>>> acl Safe_ports port 70          # gopher
>>>> acl Safe_ports port 210         # wais
>>>> acl Safe_ports port 1025-65535  # unregistered ports
>>>> acl Safe_ports port 280         # http-mgmt
>>>> acl Safe_ports port 488         # gss-http
>>>> acl Safe_ports port 591         # filemaker
>>>> acl Safe_ports port 777         # multiling http
>>>> acl CONNECT method CONNECT
>>>>
>>>> #
>>>> # Recommended minimum Access Permission configuration:
>>>> #
>>>> # Deny requests to certain unsafe ports
>>>> http_access deny !Safe_ports
>>>>
>>>> # Deny CONNECT to other than secure SSL ports
>>>> http_access deny CONNECT !SSL_ports
>>>>
>>>> # Only allow cachemgr access from localhost
>>>> http_access allow localhost manager
>>>> http_access deny manager
>>>>
>>>> # We strongly recommend the following be uncommented to protect
>>>> innocent
>>>> # web applications running on the proxy server who think the only
>>>> # one who can access services on "localhost" is a local user
>>>> #http_access deny to_localhost
>>>>
>>>> #
>>>> # INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
>>>> #
>>>>
>>>> # Example rule allowing access from your local networks.
>>>> # Adapt localnet in the ACL section to list your (internal) IP networks
>>>> # from where browsing should be allowed
>>>> http_access allow localnet
>>>> http_access allow localhost
>>>>
>>>> # And finally deny all other access to this proxy
>>>> http_access deny all
>>>>
>>>> # Squid normally listens to port 3128
>>>> #http_port 3128
>>>>
>>>> # Uncomment and adjust the following to add a disk cache directory.
>>>> #cache_dir ufs /usr/local/squid/var/cache/squid 100 16 256
>>>>
>>>> # Leave coredumps in the first cache dir
>>>> coredump_dir /usr/local/squid/var/cache/squid
>>>>
>>>> #
>>>> # Add any of your own refresh_pattern entries above these.
>>>> #
>>>> refresh_pattern ^ftp:           1440    20%     10080
>>>> refresh_pattern ^gopher:        1440    0%      1440
>>>> refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
>>>> refresh_pattern .               0       20%     4320
>>>>
>>>> http_port 3128 ssl-bump generate-host-certificates=on
>>>> dynamic_cert_mem_cache_size=4MB cert=/etc/squid/ssl_cert/myCA.pem
>>>> http_port 3129
>>>>
>>>> # SSL Bump Config
>>>> always_direct allow all
>>>> ssl_bump server-first all
>>>> sslproxy_cert_error deny all
>>>> sslproxy_flags DONT_VERIFY_PEER
>>>> sslcrtd_program /usr/local/squid/libexec/ssl_crtd -s /var/lib/ssl_db
>>>> -M 4MB sslcrtd_children 8 startup=1 idle=1
>>>>
>>>> # For squid 3.5.x
>>>> #sslcrtd_program /usr/local/squid/libexec/ssl_crtd -s /var/lib/ssl_db
>>>> -M
>>>> 4MB
>>>>
>>>> # For squid 4.x
>>>> # sslcrtd_program /usr/local/squid/libexec/security_file_certgen -s
>>>> /var/lib/ssl_db -M 4MB
>>>>
>>>> icap_enable on
>>>> icap_send_client_ip on
>>>> icap_send_client_username on
>>>> icap_client_username_header X-Authenticated-User
>>>> icap_preview_enable on
>>>> icap_preview_size 1024
>>>> icap_service service_avi_req reqmod_precache
>>>> icap://127.0.0.1:1344/request
>>>> bypass=1
>>>> adaptation_access service_avi_req allow all
>>>>
>>>> icap_service service_avi_resp respmod_precache
>>>> icap://127.0.0.1:1344/cherokee bypass=0
>>>> adaptation_access service_avi_resp allow all
>>>>
>>>> Jeff
>>>> _______________________________________________
>>>> squid-users mailing list
>>>> squid-users@xxxxxxxxxxxxxxxxxxxxx
>>>> http://lists.squid-cache.org/listinfo/squid-users
>>>>
>>>>
>>>
>>> Eliezer,
>>>
>>> Well, you certainly hit the nail on the head.  I added the following
>>> code to check the content being sent to the icap server from squid,
>>> and here is what I found when I check the headers being sent from the
>>> remote web server:
>>>
>>> Code to check for content type and encoding received by the icap
>>> server added to c-icap:
>>>
>>>    hdrs = ci_http_response_headers(req);
>>>    content_type = ci_headers_value(hdrs, "Content-Type");
>>>    if (content_type)
>>>       ci_debug_printf(1,"srv_cherokee:  content-type: %s\n",
>>>                       content_type);
>>>
>>>    content_encoding = ci_headers_value(hdrs, "Content-Encoding");
>>>    if (content_encoding)
>>>       ci_debug_printf(1,"srv_cherokee:  content-encoding: %s\n",
>>>                       content_encoding);
>>>
>>> And the output from scanned pages sent over from squid:
>>>
>>> srv_cherokee:  init request 0x7f3dbc008eb0
>>> pool hits:1 allocations: 1
>>> Allocating from objects pool object 5
>>> pool hits:1 allocations: 1
>>> Geting buffer from pool 4096:1
>>> Requested service: cherokee
>>> Read preview data if there are and process request
>>> srv_cherokee:  content-type: text/html; charset=utf-8
>>> srv_cherokee:  content-encoding: gzip         <-- As you stated, I am
>>> getting gzipped data
>>> srv_cherokee:  we expect to read :-1 body data
>>> Allow 204...
>>> Preview handler return allow 204 response
>>> srv_cherokee:  release request 0x7f3dbc008eb0
>>> Store buffer to long pool 4096:1
>>> Storing to objects pool object 5
>>> Log request to access log file /var/log/i-cap_access.log
>>>
>>>
>>> Wikipedia  at https://en.wikipedia.org/wiki/HTTP_compression describes
>>> the process as:
>>>
>>> " ...
>>>   Compression scheme negotiation[edit]
>>>   In most cases, excluding the SDCH, the negotiation is done in two
>>> steps, described in
>>>   RFC 2616:
>>>
>>>   1. The web client advertises which compression schemes it supports
>>> by including a list
>>>   of tokens in the HTTP request. For Content-Encoding, the list in a
>>> field called Accept -
>>>   Encoding; for Transfer-Encoding, the field is called TE.
>>>
>>>   GET /encrypted-area HTTP/1.1
>>>   Host: www.example.com
>>>   Accept-Encoding: gzip, deflate
>>>
>>>   2. If the server supports one or more compression schemes, the
>>> outgoing data may be
>>>   compressed by one or more methods supported by both parties. If
>>> this is the case, the
>>>   server will add a Content-Encoding or Transfer-Encoding field in
>>> the HTTP response with
>>>   the used schemes, separated by commas.
>>>
>>>   HTTP/1.1 200 OK
>>>   Date: mon, 26 June 2016 22:38:34 GMT
>>>   Server: Apache/1.3.3.7 (Unix)  (Red-Hat/Linux)
>>>   Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
>>>   Accept-Ranges: bytes
>>>   Content-Length: 438
>>>   Connection: close
>>>   Content-Type: text/html; charset=UTF-8
>>>   Content-Encoding: gzip
>>>
>>>   The web server is by no means obligated to use any compression method
>>> –
>>> this
>>>   depends on the internal settings of the web server and also may
>>> depend on the internal
>>>   architecture of the website in question.
>>>
>>>   In case of SDCH a dictionary negotiation is also required, which
>>> may involve additional
>>>   steps, like downloading a proper dictionary from .
>>> .."
>>>
>>>
>>> So, it looks like it is a feature of the browser.  So, is it possible
>>> to have squid gunzip the data or configure the browser not to send the
>>> header  to remove "Accept-Encoding: gzip, deflate" from the request
>>> sent to the remote server telling it to gzip the data?
>>>
>>> Thanks
>>>
>>> Jeff
>>> _______________________________________________
>>> squid-users mailing list
>>> squid-users@xxxxxxxxxxxxxxxxxxxxx
>>> http://lists.squid-cache.org/listinfo/squid-users
>>
>
> Well,
>
> After reviewing this problem and all of the great technical
> information folks provided, I have it working and I figured out the
> best way to deal with this transparently allowing squid to remotely
> spoof the server side with modified request headers.
>
> Compile squid with the flags:
>
> --enable-http-violation
>
> then add the following to the squid.conf file:
>
> # disable remote html data compression by replacing HTTP request headers
> # requires squid build option --enable-http-violations
> request_header_access Accept-Encoding deny all
> request_header_replace Accept-Encoding *;q=0
>
> Then tell squid to strip all Accept-Encoding request headers and
> substitute the string " *;q=0 " which tells the server not to send any
> compressed data.  I tested this with chrome which was configured to
> always send Acccept-Encoding:gzip,deflate and all of the C-ICAP data I
> am seeing is plain text html, which is what I wanted.
>
> So adding those two header instructions transparently spoofs the
> remote server into always sending uncompressed data.  I did note that
> google has some nonsense going on with chrome (mozilla does not do
> this) so that even if chrome request headers have been spoofed by
> squid specifying no gzip or deflate, the google servers will still
> send a content-encoding:gzip response header on any responses which do
> not contain any data (???), which is clearly a bug of some sort on
> HTTP responses .
>
> The browser treats the data as plain text and works correctly, even
> though it gets a content-encoding:gzip header.  It only seems to do
> this on HTTP header only payload request/responses which have no body
> text.   Responses which actually contain body data are missing the
> content-encoding header, which is what I wanted to see happen.
>
> So to summarize, the above changes will enable squid to filter and
> spoof Accept-Encoding: HTTP requests to tell the server to always send
> uncompressed data, and from my testing it works transparently on
> Chrome and Mozilla, with C-ICAP getting the uncompressed, unencrypted
> data it needs.
>
> Jeff
>

One caveat about this I discovered that there are quite a few websites
which completely ignore the Accept-Encoding request header and just go
ahead and send gzip html data even when you tell it not to.  Oh well,
back to the drawing board.

Jeff
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users




[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux