Re: questions about what's in my logs...

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Fri, 19 Jul 2013 00:43:41 +1200

On 19/07/2013 12:10 a.m., Travel Factory S.r.l. wrote:

Yesteray I moved for several hours all my users to the 2 new servers.

Since I want to test SMP / Rock and eventually SMPCarp I went to have 
a look at my logs.

My first goal is understand which max-size to set to rock cache_dir.
So I did this on one server:
grep SWAPOUT store.log.0 | sed -e 's/  */|/g' | cut -d"|" -f11| cut 
-d"/" -f2 > sizes
wc -l sizes
491458
followed by:

cat sizes | sort -g | uniq -c > result
wc -l result
68900

you can download the file, if you want, from www.bruxx.it/frank/result.

These are SWAPOUT entries and as far as I know they are stored on 
disk... are they ?

You will notice that there are 18902 requests for 43 bytes SWAPOUT. 
15414 are from http://p.twitter.com/t.gif?

Is it normal that these files are cached ?

Yes. Looks like a web-bug to me, and a lot of those are coded up using 
"no-cache" as if it were preventing caching. Which can be a nice save of 
bandwidth if it is a icon-sized bug, but the 1px once the IMS headers 
can be larger than the original payload size was to begin with - so no 
gain using no-cache over no-store.

1374048878.108 SWAPOUT 00 00008061 FE66CED6D9B9E3E31D39654ED9FE19FA 
200 1374048878 1328738114 -1 image/gif 43/43 GET 
http://p.twitter.com/t.gif?

I can't find a single HIT in access log, well.... ok, I have 1265 
TCP_REFRESH_UNMODIFIED/304
On my prodution server (squid 2.7) I only have TCP_MISS in the logs !

So I arrive at the questions:

Is it normal that these queries, with the ?, are cached ?

Yes. They are URLs just like any other. Nothing special there except the 
missing query-string portion.

Is there a list of domains/pages that it is better not to cache since 
they are changing anyway ?

No, that is not possible. There is no such thing as a page in HTTP. 
Really. There are only objects, and some of those objects happen to be 
indexes of other objects URLs with some display markup about how to 
format the collection if and when they are all downloaded.

But every response has cache control headers saying whether that 
particular response is cacheaable or not. Squid obeys those headers 
unless you configure it to disobey the protocol somehow.

After removing the 3213 entries with 0 bytes, I have 302541 entries 
with less than 9000 bytes... they cover 75% of the cached requests... 
is 9000 a good tradeoff ?

Don't remove those entries with 0-bytes. Even responses without a 
payload can be cacheable, and they are in the size range where you 
benefit by having in memory or rock store.

My biggest SWAPOUT entry is for 4MB files.... I have this line:
maximum_object_size 5 GB
but perhaps GB is not recognized ?????

Squid recognizes units up to PB right now.  Perhapse you placed it below 
the cache_dir lines in your config file. There is a bug in recent 
releases where that directive only works if it is configured above the 
cache_dir lines.

Or perhapse your biggest file transparted simpy is the 4MB (I agree it 
is probably the default maximum_object_size limit, but it could be a fluke).

Amos