Re: Accelerating proxy not matching cgi files

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Fri, 26 Aug 2011 02:22:34 +1200

On 25/08/11 21:38, Mateusz Buc wrote:
2011/8/24 Amos Jeffries<squid3@xxxxxxxxxxxxx>:

Maybe. We would need to see the HTTP headers produced by gen.cgi to be sure.
 From the description of how index.cgi/gen.cgi interact I think it highly
likely the lack of Cache-Control and Last-Modified information from gen.cgi
is causing the cache algorithms to determine its unsafe to store.

I gained access to the code of gen.cgi and made few changes:

         printf("Cache-Control: max-age=600, s-maxage=300\n");
         printf("Last-Modified: %s\n",mdate);

It now fetches timestamp from the URL, parses it to appropriate format
and then outputs as Last-Modified header. Plus I added Cache-Control.
Results are noticable - now I get most of TCP_REFRESH_UNMODIFIED/304
on my test page (gen.cgi links don't change there, so all timestamps
remain the same all the time).

Thank you a lot for these suggestions!

However, I still can't make these URLs/images cached on my squid. Is
there any chance they can be served directly from squid cache when
they do not change? Right now I have reduced network bandwidth
obviously, but not sure about CPU load - it still takes almost the
same time to load URL (about 8 seconds).

Halfway there. Stage 1 complete after a fashion.

Meaning of "TCP_REFRESH_UNMODIFIED/304" :
 - TCP_ = TCP transport used
 - REFRESH = If-Modified-Since sent to origin (aka gen.cgi)
 - UNMODIFIED = full object came back. Headers +body apparently 
identical to the known cached copy.
 - /304 = converted to a 304 "no change" response for the client half 
of the transaction.

The 304 portion going across client<->Squid is where you are getting 
*all* the bandwidth savings right now.

As I said earlier:

At this point incoming requests will either be requesting brand new content or
have an If-Modified-Since: header containing the cached objects Last-Modified: timestamp.

 NOTE: You will not _yet_ see any reduction in the 200 requests. Potentially you might
actually see an increase as "must-revalidate" causes middleware caches to start working better.

The difference you are seeing to what I predicted is caused by your use 
of max-age instead of must-revalidate.

 max-age allows the browsers to cache the graphs for 600 seconds. So 
you will get _zero_ repeat traffic for that duration. The exact opposite 
of what must-revalidate will do for you.
 On top of that you cannot see Squid serving HIT requests because of 
s-maxage. Its set at 300 so Squid will expire before the browser cache 
does. When the browser _does_ request an IMS request the Squid copy has 
already expired and forces a contact to gen.cgi to check for updates.

Okay fine, use max-age and s-maxage. To get HITs under the current 
circumstances set s-maxage larger than max-age. Or omit it and have 
Squid cache the same length as any browser. Its shared by all clients, 
so you will get some, but not a lot more.

Do you have any further tips?

Just this: Keep going.

 You are roughly up to the end of Step 1 of my earlier instructions. 
Step 2 is where the CPU benefits start appearing.

 Every time gen.cgi can decide If-Modified-Since is newer than graph 
data. It saves all the graph production CPU time AND the graph size 
worth of bandwidth.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.14
  Beta testers wanted for 3.2.0.10