Re: squid 3.2.0.5 smp scaling issues

david@xxxxxxx · Sat, 26 Mar 2011 23:02:46 -0700 (PDT)

re-sending and adding -dev list

performance drops going from 3.0 -> 3.1 -> 3.2 and in addition squid 3.2 
scales poorly (only goes up to 2x single-threaded performance going up to 
4 cores and drops off again after that)

this makes it so that I actually get better performance on 3.0 than on 
3.2, even with multiple workers

David Lang

On Mon, 21 Mar 2011, david@xxxxxxx wrote:

Date: Mon, 21 Mar 2011 19:26:38 -0700 (PDT)
From: david@xxxxxxx
To: squid-users@xxxxxxxxxxxxxxx
Subject:  squid 3.2.0.5 smp scaling issues

test setup

box A running apache and ab

test against local IP address >13000 requests/sec

box B running squid, 8 2.3 GHz Opteron cores with 16G ram

non acl/cache-peer related lines in the config are (including typos from me 
manually entering this)

http_port 8000
icp_port 0
visible_hostname gromit1
cache_effective_user proxy
cache_effective_group proxy
appaend_domain .invalid.server.name
pid_filename /var/run/squid.pid
cache_dir null /tmp
client_db off
cache_access_log syslog squid
cache_log /var/log/squid/cache.log
cache_store_log none
coredump_dir none
no_cache deny all

results when requesting short html page squid 3.0.STABLE12 4200 requests/sec
squid 3.1.11 2100 requests/sec
squid 3.2.0.5 1 worker 1400 requests/sec
squid 3.2.0.5 2 workers 2100 requests/sec
squid 3.2.0.5 3 workers 2500 requests/sec
squid 3.2.0.5 4 workers 2900 requests/sec
squid 3.2.0.5 5 workers 2900 requests/sec
squid 3.2.0.5 6 workers 2500 requests/sec
squid 3.2.0.5 7 workers 2000 requests/sec
squid 3.2.0.5 8 workers 1900 requests/sec

in all these tests the squid process was using 100% of the cpu

I tried it pulling a large file (100K instead of <50 bytes) on the thought 
that this may be bottlenecking on accepting the connections but with 
something that took more time to service the connections it could do better 
however what I found is that with 8 workers all 8 were using <50% of the CPU 
at 1000 requests/sec

local machine would do 7000 requests/sec to itself

1 worker 500 requests/sec
2 workers 957 requests/sec

from there it remained about 1000 requests/sec with the cpu utilization 
slowly dropping off (but not dropping as fast as it should with the number of 
cores available)

so it looks like there is some significant bottleneck in version 3.2 that 
makes the SMP support fairly ineffective.

in reading the wiki page at wili.squid-cache.org/Features/SmpScale I see you 
worrying about fairness between workers. If you have put in code to try and 
ensure fairness, you may want to remove it and see what happens to 
performance. what you are describing on that page in terms of fairness is 
what I would expect form a 'first-come-first-served' approach to multiple 
processes grabbing new connections. The worker that last ran is hot in the 
cache and so has an 'unfair' advantage in noticing and processing the new 
request, but as that worker gets busier, it will be spending more time 
servicing the request and the other processes will get more of a chance to 
grab the new connection, so it will appear unfair under light load, but 
become more fair under heavy load.

David Lang