high load issues

Justin Lintz <jlintz@xxxxxxxxx> · Wed, 10 Feb 2010 11:36:40 -0500

Squid ver: squid-2.6.STABLE21-3
The server is a xen virtual with 6GB of ram available to it.

relevant lines in Squid.conf:

ierarchy_stoplist cgi-bin ?
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
cache_mem 4096 MB
maximum_object_size 8192 KB
maximum_object_size_in_memory 4096 KB
cache_swap_low 95
cache_swap_high 96
cache_dir aufs /www/apps/squid/var/cache 4096 16 256
logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st
"%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh %tr
access_log /www/logs/squid/access.log combined
 cache_log /www/logs/squid/cache.log
 cache_store_log /www/logs/squid/store.log
debug_options ALL,1 33,2
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern .               0       20%     4320
negative_ttl 0
collapsed_forwarding on
refresh_stale_hit 5 seconds
half_closed_clients off
acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl SSL_ports port 443
acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl CONNECT method CONNECT
acl PURGE method PURGE
http_access allow manager localhost
http_access deny manager
http_access deny PURGE
http_access allow localhost
http_access allow all
http_reply_access allow all
icp_access allow all
httpd_suppress_version_string on
cachemgr_passwd none config
error_directory /www/apps/squid/errors
coredump_dir /var/spool/squid
minimum_expiry_time 15 seconds
max_filedesc 8192

Symptoms:
- High load avg on box ranging from 6-10 during traffic hours
- CPU iowait time during times will be between 20-50%
- SO_FAIL status codes seen in store.log
 - MaintainSwapSpace is continually running under a second. This
appears to be normal though looking at our dev and stage squid setups
which have no load.
 - From squidaio_counts, seeing the Queue spike upwards to 200 or
more.  I saw a mention in the O'Reilly book this number if greater
than 5x # of IO threads, then squid is overworked.
- Cache_dir storage size is constantly at the cache_swap_low value
(94%).  Does this mean squid is continually garbage collecting and
possibly causing the high IO?  Originally we had the number at 90, but
after reading some threads, adjusted the number to 94 for the low and
95 for the high hoping to reduce IO with smaller amount of data being
garbage collected.  This change didn't have any impact
- Saw a couple of warnings in cache.log saying
"squidaio_queue_request: WARNING - Disk I/O overloading"
- High number of create.select_fail events in store_io screen in the
cache manager.  Seeing this number at 12% of the total IO calls.

>From reading around the list of people with similar issues,  I see one
suggestion we will implement next will be configuring a second
cache_dir to increase the number of threads available for IO.

I wanted to know if you had any other suggestions for tweaks that
could be made that would hopefully alleviate the load on the box.

A couple of other tweaks we have currently implemented are putting the
noatime option on the partition where the cache is stored and using
tcmalloc inplace of gnu malloc.

I saw a recommendation of changing the store_dir_select_algorithm to
round-robin but from reading this
http://www.squid-cache.org/mail-archive/squid-users/200011/0794.html
it sounded like the change would increase the response times.

- Justin Lintz