Squid ver: squid-2.6.STABLE21-3 The server is a xen virtual with 6GB of ram available to it. relevant lines in Squid.conf: ierarchy_stoplist cgi-bin ? acl apache rep_header Server ^Apache broken_vary_encoding allow apache cache_mem 4096 MB maximum_object_size 8192 KB maximum_object_size_in_memory 4096 KB cache_swap_low 95 cache_swap_high 96 cache_dir aufs /www/apps/squid/var/cache 4096 16 256 logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh %tr access_log /www/logs/squid/access.log combined cache_log /www/logs/squid/cache.log cache_store_log /www/logs/squid/store.log debug_options ALL,1 33,2 refresh_pattern ^ftp: 1440 20% 10080 refresh_pattern ^gopher: 1440 0% 1440 refresh_pattern . 0 20% 4320 negative_ttl 0 collapsed_forwarding on refresh_stale_hit 5 seconds half_closed_clients off acl all src 0.0.0.0/0.0.0.0 acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl to_localhost dst 127.0.0.0/8 acl SSL_ports port 443 acl Safe_ports port 80 # http acl Safe_ports port 21 # ftp acl Safe_ports port 443 # https acl Safe_ports port 70 # gopher acl Safe_ports port 210 # wais acl Safe_ports port 1025-65535 # unregistered ports acl Safe_ports port 280 # http-mgmt acl Safe_ports port 488 # gss-http acl Safe_ports port 591 # filemaker acl Safe_ports port 777 # multiling http acl CONNECT method CONNECT acl PURGE method PURGE http_access allow manager localhost http_access deny manager http_access deny PURGE http_access allow localhost http_access allow all http_reply_access allow all icp_access allow all httpd_suppress_version_string on cachemgr_passwd none config error_directory /www/apps/squid/errors coredump_dir /var/spool/squid minimum_expiry_time 15 seconds max_filedesc 8192 Symptoms: - High load avg on box ranging from 6-10 during traffic hours - CPU iowait time during times will be between 20-50% - SO_FAIL status codes seen in store.log - MaintainSwapSpace is continually running under a second. This appears to be normal though looking at our dev and stage squid setups which have no load. - From squidaio_counts, seeing the Queue spike upwards to 200 or more. I saw a mention in the O'Reilly book this number if greater than 5x # of IO threads, then squid is overworked. - Cache_dir storage size is constantly at the cache_swap_low value (94%). Does this mean squid is continually garbage collecting and possibly causing the high IO? Originally we had the number at 90, but after reading some threads, adjusted the number to 94 for the low and 95 for the high hoping to reduce IO with smaller amount of data being garbage collected. This change didn't have any impact - Saw a couple of warnings in cache.log saying "squidaio_queue_request: WARNING - Disk I/O overloading" - High number of create.select_fail events in store_io screen in the cache manager. Seeing this number at 12% of the total IO calls. >From reading around the list of people with similar issues, I see one suggestion we will implement next will be configuring a second cache_dir to increase the number of threads available for IO. I wanted to know if you had any other suggestions for tweaks that could be made that would hopefully alleviate the load on the box. A couple of other tweaks we have currently implemented are putting the noatime option on the partition where the cache is stored and using tcmalloc inplace of gnu malloc. I saw a recommendation of changing the store_dir_select_algorithm to round-robin but from reading this http://www.squid-cache.org/mail-archive/squid-users/200011/0794.html it sounded like the change would increase the response times. - Justin Lintz