Re: latency issues squid2.7 WCCP

"Adrian Chadd" <adrian@xxxxxxxxxxxxxxx> · Sat, 27 Sep 2008 00:26:00 +0800

uhm, "running without cache" would mean "don't use any disk storage"
I'd suggest trying to run squid with no aufs cache_dir lines, just the
NULL line (cache_dir null /). This rules out the disk storage as a
potential candidate for failure.

Adrian

2008/9/25 Ryan Goddard <rgoddard@xxxxxxxxxxxx>:
>
> Thanks for the response, Adrian.
> Is recompile required to change to internal DNS?
> I've disabled ECN, pmtu_disc and mtu_probing.
> cache_dir is as follows:
> (recommended by Henrik)
>>
>> cache_dir aufs /squid0 125000 128 256 cache_dir aufs /squid1 125000 128
>> 256
>> cache_dir aufs /squid2 125000 128 256
>> cache_dir aufs /squid3 125000 128 256
>> cache_dir aufs /squid4 125000 128 256
>> cache_dir aufs /squid5 125000 128 256
>> cache_dir aufs /squid6 125000 128 256
>> cache_dir aufs /squid7 125000 128 256
>
> No peak data available, here's some pre-peak data:
> Cache Manager menu
> 5-MINUTE AVERAGE
> sample_start_time = 1222199580.85434 (Tue, 23 Sep 2008 19:53:00 GMT)
> sample_end_time = 1222199905.507274 (Tue, 23 Sep 2008 19:58:25 GMT)
> client_http.requests = 268.239526/sec
> client_http.hits = 111.741117/sec
> client_http.errors = 0.000000/sec
> IOSTAT shows lots of idle time - I'm unclear what you mean by
> "profiling" ?
> Also, have not tried running w/out any cache - can you explain
> how this is done?
>
> appreciate the assistance.
> -Ryan
>
>
>
> Adrian Chadd wrote:
>>
>> Firstly, you should use the internal DNS code instead of the external
>> DNS helpers.
>>
>> Secondly, I'd do a little debugging to see if its network related -
>> make sure you've disabled PMTU for example, as WCCP doesn't redirect
>> the ICMP needed. Other things like Window scaling negotiation and such
>> may contribute.
>>
>>> From a server side of things, what cache_dir config are you using?
>>
>> Whats your average/peak request rate? What about disk IO? Have you
>> done any profiling? Have you tried running the proxy without any disk
>> cache to see if the problem goes away?
>>
>> ~ terabyte of cache is quite large; I don't think any developers have
>> a terabyte of storage in a box this size in a testing environment.
>>
>> 2008/9/24 Ryan Goddard <rgoddard@xxxxxxxxxxxx>:
>>>
>>> Squid 2.7.STABLE1-20080528 on Debian Linux 2.6.19.7
>>> running on quad dual-core 2.6mhz Opterons with 32 gig RAM; 8x140GB disk
>>> partitions
>>> using WCCP L2 redirects transparently from a Cisco 4948 GigE switch
>>>
>>> Server has one GigE NIC for the incoming redirects and two GigE NICs for
>>> outbound http requests.
>>> Using IPTables to port forward HTTP to Squid; no ICP, auth, etc.;
>>> strictly a
>>> web cache using heap/LFUDA replacement
>>> and 16GB memory allocated with mem pools on, no limit.
>>>
>>> Used in an ISP environment, accommodating approx. 8k predominately cable
>>> modem customers during peak.
>>>
>>> Issue we're experiencing is some web pages taking in excess of 20 seconds
>>> to
>>> load, marked latency for customers
>>> running web-based speed tests, etc.
>>> Cache.log and Access.log aren't indicating any errors or timeouts; system
>>> operates 96 DNS instances and 32k file descriptors
>>> (neither has gotten maxed yet).
>>> General Runtime Info from Cachemgr taken during pre-peak usage:
>>> Start Time:    Tue, 23 Sep 2008 18:07:37 GMT
>>> Current Time:    Tue, 23 Sep 2008 21:00:49 GMT
>>>
>>> Connection information for squid:
>>>  Number of clients accessing cache:    3382
>>>  Number of HTTP requests received:    2331742
>>>  Number of ICP messages received:    0
>>>  Number of ICP messages sent:    0
>>>  Number of queued ICP replies:    0
>>>  Request failure ratio:     0.00
>>>  Average HTTP requests per minute since start:    13463.4
>>>  Average ICP messages per minute since start:    0.0
>>>  Select loop called: 11255153 times, 0.923 ms avg
>>> Cache information for squid:
>>>  Request Hit Ratios:    5min: 42.6%, 60min: 40.0%
>>>  Byte Hit Ratios:    5min: 21.2%, 60min: 18.6%
>>>  Request Memory Hit Ratios:    5min: 18.3%, 60min: 17.2%
>>>  Request Disk Hit Ratios:    5min: 33.6%, 60min: 33.3%
>>>  Storage Swap size:    952545580 KB
>>>  Storage Mem size:    8237648 KB
>>>  Mean Object Size:    40.43 KB
>>>  Requests given to unlinkd:    0
>>> Median Service Times (seconds)  5 min    60 min:
>>>  HTTP Requests (All):   0.19742  0.12106
>>>  Cache Misses:          0.27332  0.17711
>>>  Cache Hits:            0.08265  0.03622
>>>  Near Hits:             0.27332  0.16775
>>>  Not-Modified Replies:  0.02317  0.00865
>>>  DNS Lookups:           0.09535  0.04854
>>>  ICP Queries:           0.00000  0.00000
>>> Resource usage for squid:
>>>  UP Time:    10391.501 seconds
>>>  CPU Time:    4708.150 seconds
>>>  CPU Usage:    45.31%
>>>  CPU Usage, 5 minute avg:    33.29%
>>>  CPU Usage, 60 minute avg:    33.36%
>>>  Process Data Segment Size via sbrk(): 1041332 KB
>>>  Maximum Resident Size: 0 KB
>>>  Page faults with physical i/o: 4
>>> Memory usage for squid via mallinfo():
>>>  Total space in arena:  373684 KB
>>>  Ordinary blocks:       372642 KB    809 blks
>>>  Small blocks:               0 KB      0 blks
>>>  Holding blocks:        216088 KB     21 blks
>>>  Free Small blocks:          0 KB
>>>  Free Ordinary blocks:    1041 KB
>>>  Total in use:          588730 KB 100%
>>>  Total free:              1041 KB 0%
>>>  Total size:            589772 KB
>>> Memory accounted for:
>>>  Total accounted:       11355185 KB
>>>  memPoolAlloc calls: 439418241
>>>  memPoolFree calls: 378603777
>>> File descriptor usage for squid:
>>>  Maximum number of file descriptors:   32000
>>>  Largest file desc currently in use:   9171
>>>  Number of file desc currently in use: 8112
>>>  Files queued for open:                   2
>>>  Available number of file descriptors: 23886
>>>  Reserved number of file descriptors:   100
>>>  Store Disk files open:                 175
>>>  IO loop method:                     epoll
>>> Internal Data Structures:
>>>  23570637 StoreEntries
>>>  532260 StoreEntries with MemObjects
>>>  531496 Hot Object Cache Items
>>>  23561001 on-disk objects
>>>
>>> Generated Tue, 23 Sep 2008 21:00:47 GMT, by
>>> cachemgr.cgi/2.7.STABLE1-20080528@xxxxxxxxxxxxxxxxxx
>>>
>>>
>>> TCPDUMP shows packets traversing all interfaces as expected; bandwidth to
>>> both upstream providers isn't being maxed
>>> and when Squid is shut down, http traffic loads much faster and without
>>> any
>>> noticeable delay.
>>>
>>> Where/what else can I look at for the cause of the latency?  It becomes
>>> significantly worse during peak use - but as
>>> we're not being choked on bandwidth and things greatly improve when I
>>> shut
>>> down squid that narrows it to something
>>> on the server.  Is the amount of activity overloading a single squid
>>> process?  I'm not seeing any I/O errors in logs and haven't
>>> found any evidence the kernel is under distress.
>>> Any pointers are greatly appreciated.
>>> thanks
>>> -Ryan
>>>
>>>
>>>
>>>
>>
>>
>
>