Re: Possible bug in 3.5.5 or a store change from 2.7?

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Wed, 29 Jul 2015 07:54:24 +1200

On 29/07/2015 5:53 a.m., Tory M Blue wrote:
> squid-3.5.5-1.el6.x86_64
> 
> CentOS 6.6
> 
>  This looks like a bug in Squid v3 or a difference from 2.7.  Our monitor
> couldn't be simpler.  It requests the SAME URL twice (identical in every
> way, same hostname too), and expects the 2nd response to contain the
> X-Squid hit header.  If it does not, then Squid has some sort of race
> condition going on its code.
> 

No HTTP does not work that way, and Squid certainly does not.

Squid is event driven and Squid-3 is also asynchronously interleaving
processing of those events sub-steps where Squid-2 was just using a huge
stack / call chain per-event. When operating under load there is always
a slight delay between asynchronous operations being scheduled, and
being run.

You may as well sends a request to two completely separate pieces of
hardware and try to draw a conclusion based on one of those being a HIT.

In Squid *every* transaction processed is racing against everything else
that needs to happen *all* of the time.

A code "race condition" under those circumstances means that the steps
of a *single* transaction are running against each other. Or two things
that should not interact are clobbering each others state data. Anything
else Squid may be doing in parallel is irrelvant.

HTTP itself is stateless, always has been stateless. Any stateful
interaction between *different* transactions has always been an
illusion. Caching brings that illusion a bit closer to solidity, but its
still an illusion.

For example your "fail" result does not distinguish between the second
"MISS" being a cache near-HIT, a full MISS, or a revalidation MISS.

HTTP/1.1 can do some funky stuff sometimes. Picture a
Cache-Control:no-cache header causing replies with auth credentials
embeded to be stored for later HITs by other users. Yes thats right, and
its actually one of the desirable features - we just have to fetch new
headers from the server to attach to the object on later HIT. (Thats a
near-HIT BTW).

>  I just reproduced this by hand, using an HTTP sniffer tool.  I requested
> the same URL twice, with about a 0.25 second delay between fetches, and the
> 2nd attempt was ALSO A MISS.  Then I waited 1 second and tried a 3rd time,
> and it was FINALLY a hit.

Had the first request finished being delivered to your testing tool
before the second request was sent?

If the answer is no, then this test itself in invalid. Because of the
next thing ...

> 
> Squid v3 seems to have changed the way it stores objects.  Maybe it is
> doing some sort of "asynchronous" background store now, so if you send in
> sequential requests without a delay between, it may not actually have
> finished storing it yet, so it doesn't report a "hit".  Meaning, the first
> "miss" response may have fired off a thread to store the object, and not
> doing it in the main thread anymore like in v2, if you get my meaning.

Squid-3 in-transit objects are not listed as existing in cache_mem
storage until they have completely (and successfully) finished their
first use.

You can configure "collapsed_forwarding on" to enable the Squid-2.7
behaviour. Treating the in-transit objects list as if it were another
cache area.

But be aware that speed of the first client reading the object is
applied, even if later clients require faster delivery. The way HTTP/1.1
variant objects work also means some clients may have added lag waiting
for an object they discover to be unusable, so then have to go fetch
their correct one - resulting in 2x normal MISS latency.
 These are not new problems though 2.7 had them in its own way.

> 
> For the Squid dev team, here are the headers we are sending back from the
> origin App VIP:
> 
> 
> Accept-Ranges: none
> 
> Access-Control-Allow-Origin: *
> 
> *Cache-Control: max-age=300*
> 
> Connection: close
> 
> Content-Length: 403
> 
> Content-Type: image/jpeg
> 
> Date: Tue, 28 Jul 2015 17:25:36 GMT
> 
> Expires: Tue, 28 Jul 2015 17:30:36 GMT
> 
> Last-Modified: Mon, 11 Jun 2012 04:25:18 GMT
> 
> Server: Apache/2.2.26 (Unix)
> 
> 
> This should very much be cached right away and it's a simple tiny image.
> 
> 
> One thing we also notice is this only occurs doing load, meaning when we
> have production load traffic this fails, but if there is no other
> connections to the box, no other queries, this does not fail. Is it
> possible that squid is ejecting this that fast or is there another
> possibility here? Not sure what other data I can provide but will if asked.
> 

Timing between *completion* of the first request and starting of the
second is the critical. If the first request has not finished, theres no
hope of a HIT.

Also size of memory cache relative to the churn currently going on also
matters. If you wait too long between the test requests it will be
pushed out and get a MISS again anyway. But I dont think that is a
factor with 250ms being your timing.

Amos
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users