Re: latency issues squid2.7 WCCP

Ryan Goddard <rgoddard@xxxxxxxxxxxx> · Thu, 25 Sep 2008 09:39:34 -0500

Thanks for the response, Adrian.
Is recompile required to change to internal DNS?
I've disabled ECN, pmtu_disc and mtu_probing.
cache_dir is as follows:
(recommended by Henrik)
cache_dir aufs /squid0 125000 128 256 
cache_dir aufs /squid1 125000 128 256
cache_dir aufs /squid2 125000 128 256
cache_dir aufs /squid3 125000 128 256
cache_dir aufs /squid4 125000 128 256
cache_dir aufs /squid5 125000 128 256
cache_dir aufs /squid6 125000 128 256
cache_dir aufs /squid7 125000 128 256

No peak data available, here's some pre-peak data:
Cache Manager menu
5-MINUTE AVERAGE
sample_start_time = 1222199580.85434 (Tue, 23 Sep 2008 19:53:00 GMT)
sample_end_time = 1222199905.507274 (Tue, 23 Sep 2008 19:58:25 GMT)
client_http.requests = 268.239526/sec
client_http.hits = 111.741117/sec
client_http.errors = 0.000000/sec
IOSTAT shows lots of idle time - I'm unclear what you mean by
"profiling" ?
Also, have not tried running w/out any cache - can you explain
how this is done?

appreciate the assistance.
-Ryan

Adrian Chadd wrote:
Firstly, you should use the internal DNS code instead of the external
DNS helpers.

Secondly, I'd do a little debugging to see if its network related -
make sure you've disabled PMTU for example, as WCCP doesn't redirect
the ICMP needed. Other things like Window scaling negotiation and such
may contribute.

From a server side of things, what cache_dir config are you using?
Whats your average/peak request rate? What about disk IO? Have you
done any profiling? Have you tried running the proxy without any disk
cache to see if the problem goes away?

~ terabyte of cache is quite large; I don't think any developers have
a terabyte of storage in a box this size in a testing environment.

2008/9/24 Ryan Goddard <rgoddard@xxxxxxxxxxxx>:
Squid 2.7.STABLE1-20080528 on Debian Linux 2.6.19.7
running on quad dual-core 2.6mhz Opterons with 32 gig RAM; 8x140GB disk
partitions
using WCCP L2 redirects transparently from a Cisco 4948 GigE switch

Server has one GigE NIC for the incoming redirects and two GigE NICs for
outbound http requests.
Using IPTables to port forward HTTP to Squid; no ICP, auth, etc.; strictly a
web cache using heap/LFUDA replacement
and 16GB memory allocated with mem pools on, no limit.

Used in an ISP environment, accommodating approx. 8k predominately cable
modem customers during peak.

Issue we're experiencing is some web pages taking in excess of 20 seconds to
load, marked latency for customers
running web-based speed tests, etc.
Cache.log and Access.log aren't indicating any errors or timeouts; system
operates 96 DNS instances and 32k file descriptors
(neither has gotten maxed yet).
General Runtime Info from Cachemgr taken during pre-peak usage:
Start Time:    Tue, 23 Sep 2008 18:07:37 GMT
Current Time:    Tue, 23 Sep 2008 21:00:49 GMT

Connection information for squid:
  Number of clients accessing cache:    3382
  Number of HTTP requests received:    2331742
  Number of ICP messages received:    0
  Number of ICP messages sent:    0
  Number of queued ICP replies:    0
  Request failure ratio:     0.00
  Average HTTP requests per minute since start:    13463.4
  Average ICP messages per minute since start:    0.0
  Select loop called: 11255153 times, 0.923 ms avg
Cache information for squid:
  Request Hit Ratios:    5min: 42.6%, 60min: 40.0%
  Byte Hit Ratios:    5min: 21.2%, 60min: 18.6%
  Request Memory Hit Ratios:    5min: 18.3%, 60min: 17.2%
  Request Disk Hit Ratios:    5min: 33.6%, 60min: 33.3%
  Storage Swap size:    952545580 KB
  Storage Mem size:    8237648 KB
  Mean Object Size:    40.43 KB
  Requests given to unlinkd:    0
Median Service Times (seconds)  5 min    60 min:
  HTTP Requests (All):   0.19742  0.12106
  Cache Misses:          0.27332  0.17711
  Cache Hits:            0.08265  0.03622
  Near Hits:             0.27332  0.16775
  Not-Modified Replies:  0.02317  0.00865
  DNS Lookups:           0.09535  0.04854
  ICP Queries:           0.00000  0.00000
Resource usage for squid:
  UP Time:    10391.501 seconds
  CPU Time:    4708.150 seconds
  CPU Usage:    45.31%
  CPU Usage, 5 minute avg:    33.29%
  CPU Usage, 60 minute avg:    33.36%
  Process Data Segment Size via sbrk(): 1041332 KB
  Maximum Resident Size: 0 KB
  Page faults with physical i/o: 4
Memory usage for squid via mallinfo():
  Total space in arena:  373684 KB
  Ordinary blocks:       372642 KB    809 blks
  Small blocks:               0 KB      0 blks
  Holding blocks:        216088 KB     21 blks
  Free Small blocks:          0 KB
  Free Ordinary blocks:    1041 KB
  Total in use:          588730 KB 100%
  Total free:              1041 KB 0%
  Total size:            589772 KB
Memory accounted for:
  Total accounted:       11355185 KB
  memPoolAlloc calls: 439418241
  memPoolFree calls: 378603777
File descriptor usage for squid:
  Maximum number of file descriptors:   32000
  Largest file desc currently in use:   9171
  Number of file desc currently in use: 8112
  Files queued for open:                   2
  Available number of file descriptors: 23886
  Reserved number of file descriptors:   100
  Store Disk files open:                 175
  IO loop method:                     epoll
Internal Data Structures:
  23570637 StoreEntries
  532260 StoreEntries with MemObjects
  531496 Hot Object Cache Items
  23561001 on-disk objects

Generated Tue, 23 Sep 2008 21:00:47 GMT, by
cachemgr.cgi/2.7.STABLE1-20080528@xxxxxxxxxxxxxxxxxx

TCPDUMP shows packets traversing all interfaces as expected; bandwidth to
both upstream providers isn't being maxed
and when Squid is shut down, http traffic loads much faster and without any
noticeable delay.

Where/what else can I look at for the cause of the latency?  It becomes
significantly worse during peak use - but as
we're not being choked on bandwidth and things greatly improve when I shut
down squid that narrows it to something
on the server.  Is the amount of activity overloading a single squid
process?  I'm not seeing any I/O errors in logs and haven't
found any evidence the kernel is under distress.
Any pointers are greatly appreciated.
thanks
-Ryan