Re: cache disk failure handling?

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Tue, 29 Jan 2008 14:08:36 +1300

Chris Woodfield wrote:
Hi,

Reading the squid FAQ, it's obvious to me that putting cache_dirs on a 
RAID (particularly RAID5) has serious performance penalties and is 
highly discouraged. However, what's not as clear is how squid deals with 
single-disk failures and whether or not it handles failures gracefully 
enough to obviate the need for RAID.

If I have a squid running multiple cache_dirs on single disks, and one 
disk suffers a failure, how does squid respond? Will it simply stop 
using that cache_dir and soldier on, or can this cause an application 
crash?

Probably crash. Unfortunately Restart itself and crash again. Repeat...

Also, when starting up squid, what is the effect of an unavailable 
cache_dir? I'm thinking of a situation where squid is restarted before a 
bad disk can be replaced.

If cache_dir is completely absent/missing squid will throw up a message 
about needing -z option to create the cache and exit.

It should not be too much work to make the code ignore individual 
cache_dir missing and accept if at least one is present.
If you are interested in sponsoring it let us know.

If squid does have problems here, could using pairs of RAID1 partitions 
be an acceptable compromise, with the cost of reduced total storage?

It's the disk- write-duplication that slows the HDD and thus squid down 
on all/most TCP_MISS objects. RAID without the duplication is much less 
of a penalty, but still leaves you with the partially corrupt cache_dir 
problem of possible crashes.

Amos
--
Please use Squid 2.6STABLE17+ or 3.0STABLE1+
There are serious security advisories out on all earlier releases.