Andreev Nikita wrote:
Why does squid eat 100% of processor if the problem is in FS?
How is your cache_dir defined? aufs (in general) is a better choice
than ufs, diskd might still have some stability issues under load, and
coss is a good supplement as a small object cache. Conceivably if Squid
is set up with a ufs cache_dir mounted as NFS, it's spending a lot of
time in a wait state, blocked while the I/O completes.
For 6 days uptime:
# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 92 104052 235704 2309956 0 0 3 43 24 33 10 16 73 1 0
As you can see system has spent only 1% of CPU time in I/O wait.
(cpu-wa column).
This also shows the CPU has been 73% idle. If I'm not mistaken, you
stated that you had a two CPU (core) system, which would still leave one
of the cores at 50% idle. Running "vmstat 2" while you are experiencing
load will give more insight.
My cache dir directive looks like:
cache_dir ufs /var/spool/squid 16384 64 1024
Make sure your Squid can support it (check the output of "squid -v" for
aufs) and change this line to...
cache_dir aufs /var/spool/squid 16384 64 1024
...to enable asynchronous cache accesses.
# vmstat -d
vmstat -d
disk- ------------reads------------ ------------writes----------- -----IO------
total merged sectors ms total merged sectors ms cur sec
ram0 0 0 0 0 0 0 0 0 0 0
ram1 0 0 0 0 0 0 0 0 0 0
ram2 0 0 0 0 0 0 0 0 0 0
ram3 0 0 0 0 0 0 0 0 0 0
ram4 0 0 0 0 0 0 0 0 0 0
ram5 0 0 0 0 0 0 0 0 0 0
ram6 0 0 0 0 0 0 0 0 0 0
ram7 0 0 0 0 0 0 0 0 0 0
ram8 0 0 0 0 0 0 0 0 0 0
ram9 0 0 0 0 0 0 0 0 0 0
ram10 0 0 0 0 0 0 0 0 0 0
ram11 0 0 0 0 0 0 0 0 0 0
ram12 0 0 0 0 0 0 0 0 0 0
ram13 0 0 0 0 0 0 0 0 0 0
ram14 0 0 0 0 0 0 0 0 0 0
ram15 0 0 0 0 0 0 0 0 0 0
sda 50114 8198 1197972 239114 771044 986524 13061742 1616345 0 1239
sdb 125 1430 2383 100 3 20 184 43 0 0
sdc 547181 13909 6116481 6209599 2893943 6771249 77505040 42580590 0 8027
dm-0 6659 0 143594 45401 528574 0 4228592 1248409 0 269
dm-1 13604 0 408122 82828 883993 0 7071944 3118925 0 677
dm-2 150 0 1132 387 2 0 10 2 0 0
dm-3 36240 0 639146 173982 178529 0 1428232 540632 0 229
dm-4 164 0 1136 610 35 0 76 155 0 0
dm-5 216 0 1240 817 166439 0 332884 262910 0 185
hda 0 0 0 0 0 0 0 0 0 0
fd0 0 0 0 0 0 0 0 0 0 0
md0 0 0 0 0 0 0 0 0 0 0
Right. Unless it's mounted as part of a logical volume, NFS doesn't
show up here.
If it's not an I/O wait problem then what can cause squid to use 100%
of CPU core?
For a 4mbit circuit on recent hardware, using lots (thousands) of regex
ACLs would do it. But...
I tried to clear cache but after an hour or so squid
began to use as much CPU as usual (~100%).
...indicates to me that it's cache related. So I think it's either the
cache_dir type you are using (ufs) or the way the cache_dir is mounted
(NFS).
I'm not sure but maybe it started after we enlarged our outer link
from 2Mbps to 4Mbps.
I will try to move squid cache to local disk but squid works in VMware
Virtual Infrastructure. So if I move any of virtual machine partitions
from shared to local storage I wouldn't have an ability to move squid
VM from one HA cluster node to the other ('cause local partitions on
cluster nodes are different from each other).
Then if changing the cache_dir type doesn't help, look into using AoE or
iSCSI.
Regards,
Nikita.
Chris