On 3/08/2016 2:06 p.m., Sam M wrote: > Reading through the documentation of Collapsed Forwarding feature I don't > know if this feature would help as the problem to what I'm feeling is the > squid eviction process and decision. It looks like squid is storing more > what is being set in the cache_dir and not evicting the least recently used > files at the right time because of the heavy request load. A lot is going on. More on that below. Also, there is no mistakes possible about the too-early eviction. Because the "right time" is exactly when something else needs to use that piece of cache space. Under heavy traffic the time-based eviction of data almost never happens, everything cycles out due to load pressure far earlier thanit would naturalliy expire. The rare pieces of data that manage to reach their stale timeout are evicted *later* than that staleness point. > > Does that make sense, or is there other explanation to the issue I'm having? Yes. CF is for overlapping requests from the client. It is one of the things that may be going on. But not by default. > On Tue, Aug 2, 2016 at 9:51 PM, Sam M wrote: > >> Hi Eliezer, >> >> Thanks for your prompt reply. We are testing our squid configuration >> before we use it. That said, all objects are 1 MB in size and in order to >> test squid we queried a sequence of files multiple times in a manner that >> theoretically at the end of the querying process we should get the same >> number of hits from cache1, cache2, cache3, and cache4. >> >> Structure of test network is: User (using a script) -> cache1 -> cache2 -> >> cache3 -> cache4 -> web server (stores the queried files). >> >> I'm gonna try the Collapsed Forwarding feature and will post back if this >> fixes the issue. >> >>> >>> *From:* Sam M >>> *Sent:* Tuesday, August 2, 2016 8:43 AM >>> >>> Hi, >>> >>> I'm querying lots of files through 4 cache servers connected through >>> parent hierarchy. I clean all the caches before I start and then I query >>> the files again in the same exact order. Weirdly, every time I check the >>> logs, I see a different cache served a file compared with the previous >>> test. The query process is done through a python script that uses wget >>> through a proxy to the cache, hence the query process is really fast. >>> >>> Interestingly, if I put a delay of 1 second between each query, the >>> result will be stable and same every time I run the script. >>> >>> Following a snippet from the config file after changing it too many times >>> to make it re-produce the same results yet, that didn't help: >>> cache_dir ufs /var/spool/squid 9 16 256 >>> cache_mem 0 MB >>> memory_pools off >>> cache_swap_low 100 >>> cache_swap_high 100 >>> maximum_object_size_in_memory 0 KB >>> cache_replacement_policy lru >>> range_offset_limit 0 >>> quick_abort_min 0 KB >>> quick_abort_max 0 KB >>> >>> >>> >>> Can someone shed some light on the issue and how to fix it please? >>> TL;DR: Does not sound like a problem to me. That behaviour is how HTTP works. HTTP is stateless by design. Each request is evaluated at each proxy independently. The network itself is dynamic, in timing and known-state. When you are dealing with things on the nanosecond scale details as low down as ARP cache and maybe lower, affect the RTT and thus the timing data Squid stores about its peers and reachable servers. The HTTP object cache is just one amongst many type of caches having effects - both inside and outside Squid. At longer timescales. DNS results can be differently ordered, or rotating per-lookup, or plain different (but 'static') content per lookup source. Even the server memory access speed plays a part. By delaying traffic (or not) by some nanoseconds during the cache index lookup. There is also the absolute UTC timestamp of the request reaching the origin server versus the Expires/Cache-Control/Age/Date headers it produces. Which are also affected by all sorts of things internal to the origin itself. The delta values of these timestamps relative to the cache abolute UTC timestamp on recieving the response - is dynamic and the amount of that dynamic increases the smaller the timescale one looks at (ie faster he traffic). [probably more there I've missed]. All those little details affect in some ways the choice to determine any given request destination or whether it is served from cache. And that determination is made independently by each of the proxies in the traffic chain at the timepoint where each separate request passes through it. So with 4x caches in your chain all these tiny details are compounded 4x times on each request. Of course its going to fluctuate. Even in isolated test traffic. HTTP has a 1sec resolution on caching calculations for good reason. And even that is not enough to average out the entire affect when you compount the clock variance with multiple layers of proxy. Unless you wait multiples of whole seconds between each test request you will be guaranteed to see at least some variance in the HIT vs MISS behaviour. Even waiting you might see variation between which particular cache was a HIT. PS. Squid is only about 90% compliant with the HTTP/1.1 requirements. So there are some known bugs in the caching logics that your testing may encounter as well. Though at least bugs are "stable" in their behaviour for a given proxy build. HTH Amos _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users