Re: high load issues

Justin Lintz <jlintz@xxxxxxxxx> · Wed, 10 Feb 2010 12:41:29 -0500

We're seeing the symptoms across 4 servers on different hardware.
What would be the reason for adjusting the cache_swap_high to 96?
Thanks

- Justin Lintz

On Wed, Feb 10, 2010 at 11:45 AM, Luis Daniel Lucio Quiroz
<luis.daniel.lucio@xxxxxxxxx> wrote:
> Le Mercredi 10 Février 2010 10:36:40, Justin Lintz a écrit :
>> Squid ver: squid-2.6.STABLE21-3
>> The server is a xen virtual with 6GB of ram available to it.
>>
>> relevant lines in Squid.conf:
>>
>> ierarchy_stoplist cgi-bin ?
>> acl apache rep_header Server ^Apache
>> broken_vary_encoding allow apache
>> cache_mem 4096 MB
>> maximum_object_size 8192 KB
>> maximum_object_size_in_memory 4096 KB
>> cache_swap_low 95
>> cache_swap_high 96
>> cache_dir aufs /www/apps/squid/var/cache 4096 16 256
>> logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st
>> "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh %tr
>> access_log /www/logs/squid/access.log combined
>>  cache_log /www/logs/squid/cache.log
>>  cache_store_log /www/logs/squid/store.log
>> debug_options ALL,1 33,2
>> refresh_pattern ^ftp:           1440    20%     10080
>> refresh_pattern ^gopher:        1440    0%      1440
>> refresh_pattern .               0       20%     4320
>> negative_ttl 0
>> collapsed_forwarding on
>> refresh_stale_hit 5 seconds
>> half_closed_clients off
>> acl all src 0.0.0.0/0.0.0.0
>> acl manager proto cache_object
>> acl localhost src 127.0.0.1/255.255.255.255
>> acl to_localhost dst 127.0.0.0/8
>> acl SSL_ports port 443
>> acl Safe_ports port 80          # http
>> acl Safe_ports port 21          # ftp
>> acl Safe_ports port 443         # https
>> acl Safe_ports port 70          # gopher
>> acl Safe_ports port 210         # wais
>> acl Safe_ports port 1025-65535  # unregistered ports
>> acl Safe_ports port 280         # http-mgmt
>> acl Safe_ports port 488         # gss-http
>> acl Safe_ports port 591         # filemaker
>> acl Safe_ports port 777         # multiling http
>> acl CONNECT method CONNECT
>> acl PURGE method PURGE
>> http_access allow manager localhost
>> http_access deny manager
>> http_access deny PURGE
>> http_access allow localhost
>> http_access allow all
>> http_reply_access allow all
>> icp_access allow all
>> httpd_suppress_version_string on
>> cachemgr_passwd none config
>> error_directory /www/apps/squid/errors
>> coredump_dir /var/spool/squid
>> minimum_expiry_time 15 seconds
>> max_filedesc 8192
>>
>> Symptoms:
>> - High load avg on box ranging from 6-10 during traffic hours
>> - CPU iowait time during times will be between 20-50%
>> - SO_FAIL status codes seen in store.log
>>  - MaintainSwapSpace is continually running under a second. This
>> appears to be normal though looking at our dev and stage squid setups
>> which have no load.
>>  - From squidaio_counts, seeing the Queue spike upwards to 200 or
>> more.  I saw a mention in the O'Reilly book this number if greater
>> than 5x # of IO threads, then squid is overworked.
>> - Cache_dir storage size is constantly at the cache_swap_low value
>> (94%).  Does this mean squid is continually garbage collecting and
>> possibly causing the high IO?  Originally we had the number at 90, but
>> after reading some threads, adjusted the number to 94 for the low and
>> 95 for the high hoping to reduce IO with smaller amount of data being
>> garbage collected.  This change didn't have any impact
>> - Saw a couple of warnings in cache.log saying
>> "squidaio_queue_request: WARNING - Disk I/O overloading"
>> - High number of create.select_fail events in store_io screen in the
>> cache manager.  Seeing this number at 12% of the total IO calls.
>>
>> From reading around the list of people with similar issues,  I see one
>> suggestion we will implement next will be configuring a second
>> cache_dir to increase the number of threads available for IO.
>>
>> I wanted to know if you had any other suggestions for tweaks that
>> could be made that would hopefully alleviate the load on the box.
>>
>> A couple of other tweaks we have currently implemented are putting the
>> noatime option on the partition where the cache is stored and using
>> tcmalloc inplace of gnu malloc.
>>
>> I saw a recommendation of changing the store_dir_select_algorithm to
>> round-robin but from reading this
>> http://www.squid-cache.org/mail-archive/squid-users/200011/0794.html
>> it sounded like the change would increase the response times.
>>
>>
>>
>>
>> - Justin Lintz
> Change your
> cache_swap_high 96
>
> to something higher, 98 could be.
> look for hardware errors
>