Re: Frequent cache rebuilding

Chris Robertson <crobertson@xxxxxxx> · Thu, 22 Jan 2009 10:02:00 -0900

Andreev Nikita wrote:
Why does squid eat 100% of processor if the problem is in FS?

How is your cache_dir defined?  aufs (in general) is a better choice 
than ufs, diskd might still have some stability issues under load, and
coss is a good supplement as a small object cache.  Conceivably if Squid
is set up with a ufs cache_dir mounted as NFS, it's spending a lot of 
time in a wait state, blocked while the I/O completes.

For 6 days uptime:
# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0     92 104052 235704 2309956   0    0     3    43   24   33 10 16 73  1  0

As you can see system has spent only 1% of CPU time in I/O wait.
(cpu-wa column).

This also shows the CPU has been 73% idle.  If I'm not mistaken, you 
stated that you had a two CPU (core) system, which would still leave one 
of the cores at 50% idle.  Running "vmstat 2" while you are experiencing 
load will give more insight.

My cache dir directive looks like:
cache_dir ufs /var/spool/squid 16384 64 1024

Make sure your Squid can support it (check the output of "squid -v" for 
aufs) and change this line to...

cache_dir aufs /var/spool/squid 16384 64 1024

...to enable asynchronous cache accesses.

# vmstat -d
vmstat -d
disk- ------------reads------------ ------------writes----------- -----IO------
       total merged sectors      ms  total merged sectors      ms    cur    sec
ram0       0      0       0       0      0      0       0       0      0      0
ram1       0      0       0       0      0      0       0       0      0      0
ram2       0      0       0       0      0      0       0       0      0      0
ram3       0      0       0       0      0      0       0       0      0      0
ram4       0      0       0       0      0      0       0       0      0      0
ram5       0      0       0       0      0      0       0       0      0      0
ram6       0      0       0       0      0      0       0       0      0      0
ram7       0      0       0       0      0      0       0       0      0      0
ram8       0      0       0       0      0      0       0       0      0      0
ram9       0      0       0       0      0      0       0       0      0      0
ram10      0      0       0       0      0      0       0       0      0      0
ram11      0      0       0       0      0      0       0       0      0      0
ram12      0      0       0       0      0      0       0       0      0      0
ram13      0      0       0       0      0      0       0       0      0      0
ram14      0      0       0       0      0      0       0       0      0      0
ram15      0      0       0       0      0      0       0       0      0      0
sda    50114   8198 1197972  239114 771044 986524 13061742 1616345     0   1239
sdb      125   1430    2383     100      3     20     184      43      0      0
sdc   547181  13909 6116481 6209599 2893943 6771249 77505040 42580590  0   8027
dm-0    6659      0  143594   45401 528574      0 4228592 1248409      0    269
dm-1   13604      0  408122   82828 883993      0 7071944 3118925      0    677
dm-2     150      0    1132     387      2      0      10       2      0      0
dm-3   36240      0  639146  173982 178529      0 1428232  540632      0    229
dm-4     164      0    1136     610     35      0      76     155      0      0
dm-5     216      0    1240     817 166439      0  332884  262910      0    185
hda        0      0       0       0      0      0       0       0      0      0
fd0        0      0       0       0      0      0       0       0      0      0
md0        0      0       0       0      0      0       0       0      0      0

Right.  Unless it's mounted as part of a logical volume, NFS doesn't 
show up here.

If it's not an I/O wait problem then what can cause squid to use 100%
of CPU core?

For a 4mbit circuit on recent hardware, using lots (thousands) of regex 
ACLs would do it.  But...

 I tried to clear cache but after an hour or so squid
began to use as much CPU as usual (~100%).

...indicates to me that it's cache related.  So I think it's either the 
cache_dir type you are using (ufs) or the way the cache_dir is mounted 
(NFS).

I'm not sure but maybe it started after we enlarged our outer link
from 2Mbps to 4Mbps.

I will try to move squid cache to local disk but squid works in VMware
Virtual Infrastructure. So if I move any of virtual machine partitions
from shared to local storage I wouldn't have an ability to move squid
VM from one HA cluster node to the other ('cause local partitions on
cluster nodes are different from each other).

Then if changing the cache_dir type doesn't help, look into using AoE or 
iSCSI.

Regards,
Nikita.

Chris