Re: Squid Reverse Proxy and WebDAV caching

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Sat, 26 Aug 2017 16:21:20 +1200

On 26/08/17 00:49, Olivier MARCHETTA wrote:
Hello,

Finally Squid is caching my SharePoint online documents.
But it doesn't work yet.
If I enable offline mode, the WebDAV client will not be able to download documents from the cache.

That directive was designed for HTTP/1.0 behaviours and only works for 
objects with optional revalidation. When the server delegates caching 
freshness decision to the proxy.

When it is applied to content with mandatory revalidation; such as 
anything with no-cache, private, no-store, must-revalidate directives in 
HTTP/1.1 traffic.

The result is that things are prohibited from being delivered AND 
prohibited from being updated.

And I will see the following errors in the log:

---------------------------------------------------------------------------------
TCP_OFFLINE_HIT_ABORTED/000	https://tenant.sharepoint.com/sites/Marketing/Shared%20Documents/large1%20-%20Copy%20-%20Copy%20-%20Copy%20-%20Copy.docx
TCP_OFFLINE_HIT_ABORTED/000	https://tenant.sharepoint.com/sites/Marketing/Shared%20Documents/large1%20-%20Copy%20-%20Copy%20-%20Copy%20-%20Copy.docx
---------------------------------------------------------------------------------

Squid was simply not able to deliver anything to this client, not even 
an error message for some reason.

It might be bugs in Squid preventing it generating an error page 
(ABORTED with 5xx status). But usually ABORTED/000 means the client was 
the one aborting / disconnecting before any HTTP response at all could 
be delivered.

If I disable offline mode, then nothing gets downloaded from the cache.

How are you determining that?

What I can see in the info so far provided is that Squid *is* finding 
cached content to work with.

I have removed all ACL control from the squid conf (to make it easier for now).
I have replaced all refresh patterns by customs one (that I've found on Internet from another SharePoint caching project).

Sorry for the long file below, but I am posting my conf file again.
I don't know why the Squid cache is aborting the cache HIT.

You are forcing Squid to cache things that are marked as non-cacheable 
because they contain client-specific security or privacy details. Since 
the proxy is unable to determine for itself (on these objects) what 
details go to which client caching these things can only be done with 
revalidation before HIT delivery.

Then you are also configuring Squid to be forbidden to revalidate 
anything at all.

I suspect we have a bug somewhere in Squid that makes it do the 
ABORT/000, it should be doing a forced-MISS or a 5xx error with your 
config. But that is not what you are needing to happen anyhow, so fixing 
that particular bug wont help you.

If you have any clue, it would be very welcome.

---------------------------------------------------------------------------------
http_port 92.222.209.108:3128
icp_port 0
digest_generation off
dns_v4_first on
pid_filename /var/run/squid/squid.pid
cache_effective_user squid
cache_effective_group proxy
error_default_language en
icon_directory /usr/local/etc/squid/icons
visible_hostname sv-1101-wvp01.virtualdesk.cloud
cache_mgr pfsense@virtualdesk.cloud
access_log /var/squid/logs/access.log
cache_log /var/squid/logs/cache.log
cache_store_log none
netdb_filename /var/squid/logs/netdb.state
pinger_enable on
pinger_program /usr/local/libexec/squid/pinger

logfile_rotate 7
debug_options rotate=7
shutdown_lifetime 3 seconds
# Allow local network(s) on interface(s)
acl localnet src  92.222.209.0/24
forwarded_for on
uri_whitespace strip

cache_mem 128 MB
maximum_object_size_in_memory 512 KB
memory_replacement_policy heap GDSF
cache_replacement_policy heap LFUDA
minimum_object_size 0 KB
maximum_object_size 20 MB
cache_dir ufs /var/squid/cache 100 16 256
offline_mode off
cache_swap_low 90
cache_swap_high 95
cache allow all

# Cache documents regardless what the server says
refresh_pattern .jpg 14400 50% 18000 override-expire override-lastmod reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-auth
refresh_pattern .gif 14400 50% 18000 override-expire override-lastmod reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-auth
refresh_pattern .png 14400 50% 18000 override-expire override-lastmod reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-auth
refresh_pattern .txt 14400 50% 18000 override-expire override-lastmod reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-auth
refresh_pattern .doc 14400 50% 18000 override-expire override-lastmod reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-auth
refresh_pattern .docx 14400 50% 18000 override-expire override-lastmod reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-auth
refresh_pattern .xls 14400 50% 18000 override-expire override-lastmod reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-auth
refresh_pattern .xlsx 14400 50% 18000 override-expire override-lastmod reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-auth
refresh_pattern .pdf 14400 50% 18000 override-expire override-lastmod reload-into-ims ignore-reload ignore-no-cache ignore-private ignore-auth

The normal refresh_pattern lines should stay. Just be down here 
following your custom ones. At minimum the cgi-bin and '.' patterns are 
necessary for correct handling of dynamic content in the cache.

[ Sorry I pressed send by accident earlier before completing that 
"Also," statement which was intended to say that. ]

* The ignore-no-cache option was removed from Squid some versions ago. 
As I mentioned earlier CC:no-cache actually means things *are* cacheable 
in HTTP/1.1, so the directives intended effect is met by current Squids 
default behaviour.

* The 50% only means +50% of the objects current age. Which can be very 
short for frequently or recently updated objects. Percentages over 100% 
are possible here, and usually necessary for good caching times.

* override-lastmod was useful once to avoid bugs (and side-effects from 
misconfigured percentages mentioned above). But current Squid can figure 
out Last-Modified values from Dates and timestamps as needed. So the 
option is rarely necessary and more often than not actually causes worse 
caching in by prohibiting Squid from doing heuristic freshness calculations
 YMMV so testing for your specific traffic is needed before use of this 
option in current Squid.
 --> and remember how I mentioned offline_mode only works when the 
proxy is delegated the freshness calculations? this prohibits Squid from 
doing that calculation and uses the admin 14400 minute value instead.

* "reload-into-ims ignore-reload" these two options are mutually 
exclusive. Changing a reload header value and ignoring it cannot be done 
simultaneously. Pick one:

 ignore-reload - completely ignore the client indication that it needs 
the latest data. Note that this is redundant with what offline_mode 
does, but far more selective about what URLs it happens for.

 reload-into-ims - ask the server if any changes have happened, so the 
cached content can be delivered if none instead of a full re-fetch.

* Since all of these lines are identical except the regex pattern for 
URLs they apply to. You would save a lot more CPU cycles by combining 
the regex into one pattern and only having one config line for the lot.

 refresh_pattern \.(jpg|gif|png|txt|docx?|xlsx?pdf) 14400 50% 18000 \
   override-expire reload-into-ims ignore-private ignore-auth

* ignore-auth - I would also check the actual response headers from the 
server before using this option. While authentication credentials 
normally means non-cacheable in HTTP/1.0 traffic in HTTP/1.1 they mean 
mandatory revalidation in most cases and sometimes are irrelevant.
 What this option actually does is exclude special handling when auth 
headers are present - it actively *prevents* some HTTP/1.1 traffic being 
HIT on, when the special conditions were saying auth was cacheable or 
irrelevant.

# Setup acls
acl allsrc src all
http_access allow all

request_body_max_size 0 KB
delay_pools 1
delay_class 1 2
delay_parameters 1 -1/-1 -1/-1
delay_initial_bucket_level 100
delay_access 1 allow allsrc

These delay_parameters are doing nothing but wasting a surprisingly 
large amount of CPU time and memory for calculating traffic numbers and 
repeatedly pausing transactions for 0 milliseconds.

# Reverse Proxy settings
https_port 92.222.209.108:443 accel cert=/usr/local/etc/squid/599eae0080989.crt key=/usr/local/etc/squid/599eae0080989.key
cache_peer olicomp.sharepoint.com parent 443 0 no-query no-digest originserver login=PASSTHRU connection-auth=on ssl sslflags=DONT_VERIFY_PEER front-end-https=auto name=rvp_sharepoint

Avoid DONT_VERIFY_PEER like a plague. Find out the CA(s) which sign the 
peer's certs and configure Squid to trust only the right CA for these 
peer links, then add the NO_DEFAULT_CA flag. Even if it is one of the 
normal global CA.

That will prevent unapproved MITM on your upstream traffic and help 
detect traffic loops if the DNS+Squid config gets wonky.

deny_info TCP_RESET allsrc

This deny_info is explicitly configuring Squid to send a TCP_RESET (aka 
ABORTED/000) when ACL "allsrc" is the reason for transaction denial.

With your access control rules removed it should not be having an 
effect, but beware of the above when you reinstate those rules.

Amos
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users