Re: high load issues

Luis Daniel Lucio Quiroz <luis.daniel.lucio@xxxxxxxxx> · Wed, 10 Feb 2010 10:45:54 -0600

Le Mercredi 10 Février 2010 10:36:40, Justin Lintz a écrit :
> Squid ver: squid-2.6.STABLE21-3
> The server is a xen virtual with 6GB of ram available to it.
> 
> relevant lines in Squid.conf:
> 
> ierarchy_stoplist cgi-bin ?
> acl apache rep_header Server ^Apache
> broken_vary_encoding allow apache
> cache_mem 4096 MB
> maximum_object_size 8192 KB
> maximum_object_size_in_memory 4096 KB
> cache_swap_low 95
> cache_swap_high 96
> cache_dir aufs /www/apps/squid/var/cache 4096 16 256
> logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st
> "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh %tr
> access_log /www/logs/squid/access.log combined
>  cache_log /www/logs/squid/cache.log
>  cache_store_log /www/logs/squid/store.log
> debug_options ALL,1 33,2
> refresh_pattern ^ftp:           1440    20%     10080
> refresh_pattern ^gopher:        1440    0%      1440
> refresh_pattern .               0       20%     4320
> negative_ttl 0
> collapsed_forwarding on
> refresh_stale_hit 5 seconds
> half_closed_clients off
> acl all src 0.0.0.0/0.0.0.0
> acl manager proto cache_object
> acl localhost src 127.0.0.1/255.255.255.255
> acl to_localhost dst 127.0.0.0/8
> acl SSL_ports port 443
> acl Safe_ports port 80          # http
> acl Safe_ports port 21          # ftp
> acl Safe_ports port 443         # https
> acl Safe_ports port 70          # gopher
> acl Safe_ports port 210         # wais
> acl Safe_ports port 1025-65535  # unregistered ports
> acl Safe_ports port 280         # http-mgmt
> acl Safe_ports port 488         # gss-http
> acl Safe_ports port 591         # filemaker
> acl Safe_ports port 777         # multiling http
> acl CONNECT method CONNECT
> acl PURGE method PURGE
> http_access allow manager localhost
> http_access deny manager
> http_access deny PURGE
> http_access allow localhost
> http_access allow all
> http_reply_access allow all
> icp_access allow all
> httpd_suppress_version_string on
> cachemgr_passwd none config
> error_directory /www/apps/squid/errors
> coredump_dir /var/spool/squid
> minimum_expiry_time 15 seconds
> max_filedesc 8192
> 
> Symptoms:
> - High load avg on box ranging from 6-10 during traffic hours
> - CPU iowait time during times will be between 20-50%
> - SO_FAIL status codes seen in store.log
>  - MaintainSwapSpace is continually running under a second. This
> appears to be normal though looking at our dev and stage squid setups
> which have no load.
>  - From squidaio_counts, seeing the Queue spike upwards to 200 or
> more.  I saw a mention in the O'Reilly book this number if greater
> than 5x # of IO threads, then squid is overworked.
> - Cache_dir storage size is constantly at the cache_swap_low value
> (94%).  Does this mean squid is continually garbage collecting and
> possibly causing the high IO?  Originally we had the number at 90, but
> after reading some threads, adjusted the number to 94 for the low and
> 95 for the high hoping to reduce IO with smaller amount of data being
> garbage collected.  This change didn't have any impact
> - Saw a couple of warnings in cache.log saying
> "squidaio_queue_request: WARNING - Disk I/O overloading"
> - High number of create.select_fail events in store_io screen in the
> cache manager.  Seeing this number at 12% of the total IO calls.
> 
> From reading around the list of people with similar issues,  I see one
> suggestion we will implement next will be configuring a second
> cache_dir to increase the number of threads available for IO.
> 
> I wanted to know if you had any other suggestions for tweaks that
> could be made that would hopefully alleviate the load on the box.
> 
> A couple of other tweaks we have currently implemented are putting the
> noatime option on the partition where the cache is stored and using
> tcmalloc inplace of gnu malloc.
> 
> I saw a recommendation of changing the store_dir_select_algorithm to
> round-robin but from reading this
> http://www.squid-cache.org/mail-archive/squid-users/200011/0794.html
> it sounded like the change would increase the response times.
> 
> 
> 
> 
> - Justin Lintz
Change your
cache_swap_high 96 

to something higher, 98 could be.
look for hardware errors