Re: Is it true that even threaded Squid can't benefit from SMP systems?

Chris Woodfield <rekoil@xxxxxxxxxxxxx> · Tue, 19 May 2009 10:34:57 -0400

A couple lessons learned from my end, both in my own experience and  
picked up from various squid-users threads...
I've said this before, but never underestimate the value of kernel  
page cache. If you need to scale the box, put in as much RAM as you  
can afford.
Also, as has been said before, squid + RAID = PAIN (particularly  
RAID5). Performance will be much better if you can set up multiple  
physical disks under separate cache_dirs, thus allowing async reads to  
take place in parallel. If disk redundancy is a must, stick with RAID  
1 pairs (multiple RAID 1 pairs work well, particularly with a hardware  
controller).
If your traffic load is mostly small ( ~ < 1 MB ) objects, consider  
utilizing COSS storage as an alternative to AUFS; this will give you  
much more bang for the buck if you're serving large numbers of small  
objects, since it eliminates the overhead of the millions of of open()/ 
close() kernel system calls you'd see with AUFS.
If you find yourself hitting the single-core CPU bottleneck due to  
squid's main loop, it is possible to run multiple squids on a box,  
although each one requires its own cache storage. If you need to move  
to this, consider configuring one of more "front-end" squids that  
refer queries to multiple "back-end" parent caches via CARP to  
eliminate duplicating object storage.
HTH,

-Chris

On May 19, 2009, at 8:47 AM, rihad wrote:

Jeff Pang wrote:
rihad:
But what about Posix threads & Async IO?  (./configure --enable- 
async-io=2 ...)? Don't they take advantage of multiple CPUs/cores/ 
cache_dirs?
Yes Async-IO benefits from multi-cpu on disk IO, if you're using it.
Squid's main daemon is a single process, that benefits nothing from  
SMP   system.
Since disk I/O is often the bottleneck (given enough RAM), it can be  
said that, thanks to async I/O, Squid mostly scales well to the  
number of CPUs, issuing several disk I/O operations simultaneously &  
asynchronously, so it can proceed to execute the main loop without  
waiting for I/O completion? In that case that part of the FAQ needs  
updating, I guess.