On 04/02/2013 11:41 AM, Ed W wrote: >>> My main requirement is that I have two proxies on either side of a >>> bandwidth limited link (with high cost). I want the situation that when >>> a client GETs some object, >> A client GETs some object currently in the cache and with ETag, but that >> cached object is either stale or being forcefully reloaded by the >> client, right? > Yes. Or some second client requests the same object so we need to do a > freshness check, or client clears their cache, or upstream doesn't > correctly implement IF-MODIFIED-SINCE, etc, etc > > I'm not trying to decrease the incidence of squid asking the upstream > server if the object is fresh (which could also trigger non idempotent > changes), however, I will try to reduce the amount of bandwidth used > over the proxy-to-proxy middle link (which crosses an expensive sat > connection) by ensuring that etags are set on important resources (eg > creating one where it doesn't exist, using some hash of the content body) > > What I have probably failed to consider properly is a change in headers > between two otherwise identical responses (ie same bodies), but I guess > that will become clear later. > > Also I think VARY support will either drop out or be required. I have a > use in mind which would become dependent on browser version (eg serving > webp graphics to chrome) >>> we can convert this to an IF-NONE-MATCH and >>> trust the etag confirms that the object is unchanged. >> >> >>> Note, I am aware of the limitations of trusting etags. In my setup I >>> will have control over the proxy on the high speed side of the >>> connection and we can use various methods on that side to ensure that >>> the etags are sane. The main goal is to minimise bandwidth across the >>> intermediate (expensive) link. >>> >>> Previously we discussed all kinds of complex ideas including >>> implementing trailers, and custom headers with hash values. On >>> reflection I think everything required can be done using only etag >>> revalidation (and some tweaking of etags, but squid needs know nothing >>> about that...) >> Yes, reload-into-If-None-Match and stale-into-If-None-Match features >> sound simple. The latter may even be supported already (will check). If >> something outside of Squid provides reliable-enough ETags to all >> cachable responses, then the complexities discussed earlier go away. >> >> Please confirm whether my understanding of your updated requirements is >> correct. > > I believe so. > > So, the situation is a downstream client talking to two squid proxies in > a chain, through to the eventual upstream web server. Between the two > squid proxies is an expensive internet link (charged by the byte) and so > we desire to minimise bytes across the link. > > Essentially an upstream adaption proxy will used on the "fast" (ie > "cheap") side of the connection. This will examine all responses before > they are handed to "fast side" squid and in this proxy we will beat the > etag into shape, eg adding an SHA hash if none exists, etc. Obviously I > have to accept all breakage which occurs if I change the upstream's etag > - however, I think we have this covered. > > My goal is that if an object has the same response body, and it's > already in squid cache on the "slow" side of the link, then we "freshen" > the resource by going back to the origin server via our pair of squid > servers, however, we avoid the transfer of the body back across the > expensive link (between the two squid proxies) if the etag still matches > > I hope this will also be useful to others than just me! Yes, I believe most of the ETag improvements you want will be generally useful, including in environments where ETags come from origin servers (rather than being added by Squid in violation of HTTP rules). I still think that the earlier comprehensive design with a new/dedicated checksum headers would be an overall better solution for your specific problem, but it would take a lot more time to implement while you should be able to get quite a bit by simply [ab]using ETags. And since the changes you need are mostly generally useful, I do not see any big problems with this simplified approach, at least as the first step. Cheers, Alex.